Automatically Hash Tagging Text With ASP.NET Web Forms (VB.NET)

I had previously blogged a solution in PHP to automatically hash tag an input string with various terms stored in a database. Here’s an ASP.NET Web Forms version of the same solution (this one should work for ASP.NET 2, 3.5 and 4).

To review, a hash tag is a bit of text, led with a hash mark (#), that serves to indicate to some Web sites / services — notably, Twitter — that the word thus marked should be treated as a tag. This code will take some piece of input text, search it for terms we generally want to tag, and mark the instances in that input string with hash tags.

As in the previous solution, we’ll define a “word” for the purposes of this demo to be any alphanumeric character sequence that is followed by a space or a newline. Also, this demo will only tag text; it won’t automatically add new terms to the database. That will be the subject of an upcoming post.

I’m going to use three ArrayLists as the workhorses for this solution. One will hold the terms from the database; the second will contain all the distinct words in the input string; and the third, the words from the input string that are hashtag terms.

The HTML / ASP.NET Controls

For the purposes of this demo, we’ll just have a form with a Label (to show results or error messages), TextBox (to provide the input text), a RequiredFieldValidator and a Button. I’ll also output in a DetailsView, bound to a SqlDataSource, all the terms stored in the database at the moment.

<h2>Automatically Hashtagging An Input String</h2>
<p><asp:Label runat="server" ID="lblResult" Text='Enter some text in the box, then click the submit button. Results will be shown here.' /></p>
<asp:TextBox runat="server" ID="tbInput" TextMode="MultiLine" Rows="10" Columns="50" Text="Amazon uses HTML5 and JavaScript; Google owns YouTube."  />
<asp:RequiredFieldValidator runat="server" ID="rfvInput" ControlToValidate="tbInput" ErrorMessage='<br />Please provide some text.' CssClass="warning" Display="Dynamic" />
<br />
<asp:Button runat="server" ID="btnSubmit" Text="Submit" />

<h4>Terms in the database</h4>
<asp:DataList runat="server" ID="dlTerms" DataSourceID="sqlTerms" RepeatColumns="10" RepeatDirection="Horizontal" CellPadding="5" CellSpacing="0" ItemStyle-BorderColor="Black" ItemStyle-BorderWidth="1">
	<ItemTemplate>
		<%#Eval("term_text")%>
	</ItemTemplate>
</asp:DataList>

<asp:SqlDataSource runat="server" ID="sqlTerms" SelectCommand="your stored procedure" SelectCommandType="StoredProcedure" ConnectionString="<%$ ConnectionStrings:your connection string%>" />

The GetAllTerms Function

First up, a function that retrieves from the database all the hashtag terms and returns them as an array.

Function GetAllTerms() As ArrayList
	'retrieves all terms from the database
	'returns empty ArrayList on error,
	'populated ArrayList on success

	Dim arrOut As New ArrayList()

	Dim objConn As New SqlConnection(ConfigurationManager.ConnectionStrings("your connections string").ConnectionString)
	Dim objCmd As New SqlCommand("your stored procedure", objConn)
	objCmd.CommandType = CommandType.StoredProcedure

	Dim objReader As SqlDataReader
	objConn.Open()
	objReader = objCmd.ExecuteReader()
	While objReader.Read()
		arrOut.Add(objReader(0))
	End While
	objConn.Close()
	objCmd.Dispose()
	objConn.Dispose()

	Return arrOut
End Function

The ExtractTerms Function

Now we need to get all the potentially taggable words in our input string.

Function ExtractTerms(ByVal strInput As String) As ArrayList
	'extracts all words from textbox
	'returns them as ArrayList, empty ArrayList on error
	'words are any alphanumeric sequence before a space or newline

	Dim arrOut As New ArrayList()

	Dim reWords As New Regex("\w+(\s|$)", RegexOptions.IgnoreCase Or RegexOptions.CultureInvariant)
	Dim reMatches As MatchCollection = reWords.Matches(tbInput.Text)

	For Each reMatch As Match In reMatches
		arrOut.Add(reMatch.Value.Trim)
	Next

	Return arrOut
End Function

The CompareLists Function

Now that we have ArrayLists with all terms and all words, we can compare the two, and create an ArrayList that contains the words we intend to tag.

Note that we take care to cast the terms and words to lower-case, and return the input text version of any terms found. That’s because we want to preserve case in the input string. That’s also why I can’t use the ArrayList.Contains method; it’s case-sensitive when comparing strings. (Actually, that can be overridden or worked around; but it’s involved and somewhat complicated, so it’s also a subject for some other column).

Function CompareLists(ByVal arrTerms As ArrayList, ByVal arrWords As ArrayList) As ArrayList
	'compares term list against word list
	'returns ArrayList with all words found in terms
	'maintains case

	Dim arrOut As New ArrayList()

	For Each strWord As String In arrWords
		For Each strTerm As String In arrTerms
			If strTerm.ToLower = strWord.ToLower Then
				arrOut.Add(strWord)
				Exit For
			End If
		Next
	Next

	Return arrOut
End Function

The AutoTagSubject Function

Finally, we need a function that will take the words we want autotagged, and apply the autotagging to the input string.

Note that before we proceed with tagging, we remove all current hash marks from the input string. That’s to avoid double-hashing words that may have been tagged in the input text.

Function AutoTagSubject(ByVal strInput As String, ByVal arrTerms As ArrayList) As String
	'applies arrTerms as hashtags to strInput
	'removes hashtags first to avoid double-tagging

	Dim strOut As String = strInput
	strOut = strOut.Replace("#", "")

	For Each strTerm As String In arrTerms
		strOut = strOut.Replace(strTerm, "#" & strTerm)
	Next

	Return strOut
End Function

The btnSubmit_click Subroutine

We now need a simple subroutine to invoke our functions and autotag the input text.

Sub btnSubmit_click(ByVal Sender As Object, ByVal E As EventArgs) Handles btnSubmit.Click
	'get terms from database
	Dim arrTerms As New ArrayList()
	arrTerms = GetAllTerms()

	'get unique words from input text
	Dim arrWords As New ArrayList()
	arrWords = ExtractTerms(tbInput.Text)

	If arrTerms.Count < 1 Then
		lblResult.Text = "There are no terms in the database, or there was an error retrieving the terms."
		lblResult.CssClass = "warning"
	ElseIf arrWords.Count < 1 Then
		lblResult.Text = "There are no words in the string to be tagged."
		lblResult.CssClass = "warning"
	Else
		'get matches between terms and input words
		Dim arrHashes As New ArrayList()
		arrHashes = CompareLists(arrTerms, arrWords)

		If arrHashes.Count < 1 Then
			lblResult.Text = "There were no matches between the input text and the terms in the database."
			lblResult.CssClass = ""
		Else
			'display found terms
			Dim sbMsg As New StringBuilder("The following terms were found: ")
			For Each strTerm As String In arrHashes
				sbMsg.Append(strTerm)
				sbMsg.Append(", ")
			Next
			sbMsg.Remove(sbMsg.Length - 2, 2)
			lblResult.Text = sbMsg.ToString()
			lblResult.CssClass = ""

			'autotag input string
			tbInput.Text = AutoTagSubject(tbInput.Text, arrHashes)
		End If
	End If

End Sub

And that’s all there is to it. You can see a working demo at http://www.dougv.net/demos/auto_hashtag/

You can also download the demo code. I distribute code under the GNU GPL.

All links in this post on delicious: http://www.delicious.com/dougvdotcom/automatically-hash-tagging-text-with-asp-net-web-forms-vb-net

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Current ye@r *