Automatically Hash Tagging Text With ASP.NET Web Forms (VB.NET)
I had previously blogged a solution in PHP to automatically hash tag an input string with various terms stored in a database. Here’s an ASP.NET Web Forms version of the same solution (this one should work for ASP.NET 2, 3.5 and 4).
To review, a hash tag is a bit of text, led with a hash mark (#), that serves to indicate to some Web sites / services — notably, Twitter — that the word thus marked should be treated as a tag. This code will take some piece of input text, search it for terms we generally want to tag, and mark the instances in that input string with hash tags.
As in the previous solution, we’ll define a “word” for the purposes of this demo to be any alphanumeric character sequence that is followed by a space or a newline. Also, this demo will only tag text; it won’t automatically add new terms to the database. That will be the subject of an upcoming post.
I’m going to use three ArrayLists as the workhorses for this solution. One will hold the terms from the database; the second will contain all the distinct words in the input string; and the third, the words from the input string that are hashtag terms.
The HTML / ASP.NET Controls
For the purposes of this demo, we’ll just have a form with a Label (to show results or error messages), TextBox (to provide the input text), a RequiredFieldValidator and a Button. I’ll also output in a DetailsView, bound to a SqlDataSource, all the terms stored in the database at the moment.
<h2>Automatically Hashtagging An Input String</h2>
<p><asp:Label runat="server" ID="lblResult" Text='Enter some text in the box, then click the submit button. Results will be shown here.' /></p>
<asp:TextBox runat="server" ID="tbInput" TextMode="MultiLine" Rows="10" Columns="50" Text="Amazon uses HTML5 and JavaScript; Google owns YouTube." />
<asp:RequiredFieldValidator runat="server" ID="rfvInput" ControlToValidate="tbInput" ErrorMessage='<br />Please provide some text.' CssClass="warning" Display="Dynamic" />
<br />
<asp:Button runat="server" ID="btnSubmit" Text="Submit" />
<h4>Terms in the database</h4>
<asp:DataList runat="server" ID="dlTerms" DataSourceID="sqlTerms" RepeatColumns="10" RepeatDirection="Horizontal" CellPadding="5" CellSpacing="0" ItemStyle-BorderColor="Black" ItemStyle-BorderWidth="1">
<ItemTemplate>
<%#Eval("term_text")%>
</ItemTemplate>
</asp:DataList>
<asp:SqlDataSource runat="server" ID="sqlTerms" SelectCommand="your stored procedure" SelectCommandType="StoredProcedure" ConnectionString="<%$ ConnectionStrings:your connection string%>" />
The GetAllTerms Function
First up, a function that retrieves from the database all the hashtag terms and returns them as an array.
Function GetAllTerms() As ArrayList
'retrieves all terms from the database
'returns empty ArrayList on error,
'populated ArrayList on success
Dim arrOut As New ArrayList()
Dim objConn As New SqlConnection(ConfigurationManager.ConnectionStrings("your connections string").ConnectionString)
Dim objCmd As New SqlCommand("your stored procedure", objConn)
objCmd.CommandType = CommandType.StoredProcedure
Dim objReader As SqlDataReader
objConn.Open()
objReader = objCmd.ExecuteReader()
While objReader.Read()
arrOut.Add(objReader(0))
End While
objConn.Close()
objCmd.Dispose()
objConn.Dispose()
Return arrOut
End Function
The ExtractTerms Function
Now we need to get all the potentially taggable words in our input string.
Function ExtractTerms(ByVal strInput As String) As ArrayList
'extracts all words from textbox
'returns them as ArrayList, empty ArrayList on error
'words are any alphanumeric sequence before a space or newline
Dim arrOut As New ArrayList()
Dim reWords As New Regex("\w+(\s|$)", RegexOptions.IgnoreCase Or RegexOptions.CultureInvariant)
Dim reMatches As MatchCollection = reWords.Matches(tbInput.Text)
For Each reMatch As Match In reMatches
arrOut.Add(reMatch.Value.Trim)
Next
Return arrOut
End Function
The CompareLists Function
Now that we have ArrayLists with all terms and all words, we can compare the two, and create an ArrayList that contains the words we intend to tag.
Note that we take care to cast the terms and words to lower-case, and return the input text version of any terms found. That’s because we want to preserve case in the input string. That’s also why I can’t use the ArrayList.Contains method; it’s case-sensitive when comparing strings. (Actually, that can be overridden or worked around; but it’s involved and somewhat complicated, so it’s also a subject for some other column).
Function CompareLists(ByVal arrTerms As ArrayList, ByVal arrWords As ArrayList) As ArrayList 'compares term list against word list 'returns ArrayList with all words found in terms 'maintains case Dim arrOut As New ArrayList() For Each strWord As String In arrWords For Each strTerm As String In arrTerms If strTerm.ToLower = strWord.ToLower Then arrOut.Add(strWord) Exit For End If Next Next Return arrOut End Function
The AutoTagSubject Function
Finally, we need a function that will take the words we want autotagged, and apply the autotagging to the input string.
Note that before we proceed with tagging, we remove all current hash marks from the input string. That’s to avoid double-hashing words that may have been tagged in the input text.
Function AutoTagSubject(ByVal strInput As String, ByVal arrTerms As ArrayList) As String
'applies arrTerms as hashtags to strInput
'removes hashtags first to avoid double-tagging
Dim strOut As String = strInput
strOut = strOut.Replace("#", "")
For Each strTerm As String In arrTerms
strOut = strOut.Replace(strTerm, "#" & strTerm)
Next
Return strOut
End Function
The btnSubmit_click Subroutine
We now need a simple subroutine to invoke our functions and autotag the input text.
Sub btnSubmit_click(ByVal Sender As Object, ByVal E As EventArgs) Handles btnSubmit.Click
'get terms from database
Dim arrTerms As New ArrayList()
arrTerms = GetAllTerms()
'get unique words from input text
Dim arrWords As New ArrayList()
arrWords = ExtractTerms(tbInput.Text)
If arrTerms.Count < 1 Then
lblResult.Text = "There are no terms in the database, or there was an error retrieving the terms."
lblResult.CssClass = "warning"
ElseIf arrWords.Count < 1 Then
lblResult.Text = "There are no words in the string to be tagged."
lblResult.CssClass = "warning"
Else
'get matches between terms and input words
Dim arrHashes As New ArrayList()
arrHashes = CompareLists(arrTerms, arrWords)
If arrHashes.Count < 1 Then
lblResult.Text = "There were no matches between the input text and the terms in the database."
lblResult.CssClass = ""
Else
'display found terms
Dim sbMsg As New StringBuilder("The following terms were found: ")
For Each strTerm As String In arrHashes
sbMsg.Append(strTerm)
sbMsg.Append(", ")
Next
sbMsg.Remove(sbMsg.Length - 2, 2)
lblResult.Text = sbMsg.ToString()
lblResult.CssClass = ""
'autotag input string
tbInput.Text = AutoTagSubject(tbInput.Text, arrHashes)
End If
End If
End Sub
And that’s all there is to it. You can see a working demo at http://www.dougv.net/demos/auto_hashtag/
You can also download the demo code. I distribute code under the GNU GPL.
All links in this post on delicious: http://www.delicious.com/dougvdotcom/automatically-hash-tagging-text-with-asp-net-web-forms-vb-net
Related Posts
- Automatically Hash Tagging Text With PHP And MySQL Part 2: Adding New Hash Tags To The Database Table (40.2)
- Automatically Hash Tagging Text With PHP And MySQL (37.9)
- Getting Plain Text From An ASP.NET 2.0 Page For Use As An AJAX Data Source (20.7)
- Displaying An Image Stored In A SQL Server Database On An ASP.NET Page Using VB.NET (16.2)
- Creating An ASP.NET RSS Feed, Using Data From SQL Server And HTTP WebHandler (15.5)
The numbers inside parentheses are relevance scores. Scoring is based, in order of priority, on title, category, content and tags. The higher the score, the more likely that post relates to this post.


Comments Closed