I had previously blogged a solution in PHP to automatically hash tag an input string with various terms stored in a database. Here’s an ASP.NET Web Forms version of the same solution (this one should work for ASP.NET 2, 3.5 and 4).
To review, a hash tag is a bit of text, led with a hash mark (#), that serves to indicate to some Web sites / services — notably, Twitter — that the word thus marked should be treated as a tag. This code will take some piece of input text, search it for terms we generally want to tag, and mark the instances in that input string with hash tags.
As in the previous solution, we’ll define a “word” for the purposes of this demo to be any alphanumeric character sequence that is followed by a space or a newline. Also, this demo will only tag text; it won’t automatically add new terms to the database. That will be the subject of an upcoming post.
I’m going to use three ArrayLists as the workhorses for this solution. One will hold the terms from the database; the second will contain all the distinct words in the input string; and the third, the words from the input string that are hashtag terms.
The HTML / ASP.NET Controls
For the purposes of this demo, we’ll just have a form with a Label (to show results or error messages), TextBox (to provide the input text), a RequiredFieldValidator and a Button. I’ll also output in a DetailsView, bound to a SqlDataSource, all the terms stored in the database at the moment.
The GetAllTerms Function
First up, a function that retrieves from the database all the hashtag terms and returns them as an array.
Function GetAllTerms() As ArrayList 'retrieves all terms from the database 'returns empty ArrayList on error, 'populated ArrayList on success Dim arrOut As New ArrayList() Dim objConn As New SqlConnection(ConfigurationManager.ConnectionStrings("your connections string").ConnectionString) Dim objCmd As New SqlCommand("your stored procedure", objConn) objCmd.CommandType = CommandType.StoredProcedure Dim objReader As SqlDataReader objConn.Open() objReader = objCmd.ExecuteReader() While objReader.Read() arrOut.Add(objReader(0)) End While objConn.Close() objCmd.Dispose() objConn.Dispose() Return arrOut End Function
The ExtractTerms Function
Now we need to get all the potentially taggable words in our input string.
Function ExtractTerms(ByVal strInput As String) As ArrayList 'extracts all words from textbox 'returns them as ArrayList, empty ArrayList on error 'words are any alphanumeric sequence before a space or newline Dim arrOut As New ArrayList() Dim reWords As New Regex("\w+(\s|$)", RegexOptions.IgnoreCase Or RegexOptions.CultureInvariant) Dim reMatches As MatchCollection = reWords.Matches(tbInput.Text) For Each reMatch As Match In reMatches arrOut.Add(reMatch.Value.Trim) Next Return arrOut End Function
The CompareLists Function
Now that we have ArrayLists with all terms and all words, we can compare the two, and create an ArrayList that contains the words we intend to tag.
Note that we take care to cast the terms and words to lower-case, and return the input text version of any terms found. That’s because we want to preserve case in the input string. That’s also why I can’t use the ArrayList.Contains method; it’s case-sensitive when comparing strings. (Actually, that can be overridden or worked around; but it’s involved and somewhat complicated, so it’s also a subject for some other column).
Function CompareLists(ByVal arrTerms As ArrayList, ByVal arrWords As ArrayList) As ArrayList 'compares term list against word list 'returns ArrayList with all words found in terms 'maintains case Dim arrOut As New ArrayList() For Each strWord As String In arrWords For Each strTerm As String In arrTerms If strTerm.ToLower = strWord.ToLower Then arrOut.Add(strWord) Exit For End If Next Next Return arrOut End Function
The AutoTagSubject Function
Finally, we need a function that will take the words we want autotagged, and apply the autotagging to the input string.
Note that before we proceed with tagging, we remove all current hash marks from the input string. That’s to avoid double-hashing words that may have been tagged in the input text.
Function AutoTagSubject(ByVal strInput As String, ByVal arrTerms As ArrayList) As String 'applies arrTerms as hashtags to strInput 'removes hashtags first to avoid double-tagging Dim strOut As String = strInput strOut = strOut.Replace("#", "") For Each strTerm As String In arrTerms strOut = strOut.Replace(strTerm, "#" & strTerm) Next Return strOut End Function
The btnSubmit_click Subroutine
We now need a simple subroutine to invoke our functions and autotag the input text.
Sub btnSubmit_click(ByVal Sender As Object, ByVal E As EventArgs) Handles btnSubmit.Click 'get terms from database Dim arrTerms As New ArrayList() arrTerms = GetAllTerms() 'get unique words from input text Dim arrWords As New ArrayList() arrWords = ExtractTerms(tbInput.Text) If arrTerms.Count < 1 Then lblResult.Text = "There are no terms in the database, or there was an error retrieving the terms." lblResult.CssClass = "warning" ElseIf arrWords.Count < 1 Then lblResult.Text = "There are no words in the string to be tagged." lblResult.CssClass = "warning" Else 'get matches between terms and input words Dim arrHashes As New ArrayList() arrHashes = CompareLists(arrTerms, arrWords) If arrHashes.Count < 1 Then lblResult.Text = "There were no matches between the input text and the terms in the database." lblResult.CssClass = "" Else 'display found terms Dim sbMsg As New StringBuilder("The following terms were found: ") For Each strTerm As String In arrHashes sbMsg.Append(strTerm) sbMsg.Append(", ") Next sbMsg.Remove(sbMsg.Length - 2, 2) lblResult.Text = sbMsg.ToString() lblResult.CssClass = "" 'autotag input string tbInput.Text = AutoTagSubject(tbInput.Text, arrHashes) End If End If End Sub
And that’s all there is to it. You can see a working demo at http://www.dougv.net/demos/auto_hashtag/
You can also download the demo code. I distribute code under the GNU GPL.
All links in this post on delicious: http://www.delicious.com/dougvdotcom/automatically-hash-tagging-text-with-asp-net-web-forms-vb-net