Thursday, January 29, 2009

The URL swapping Gedankenversuch

Tip of the hat to Mr. D. H., whose Twitter question (shown below) inspired this entry.

"How could I search a string for urls, if found, do something, then replace the new url?"

I'll be the first to admit I'm not the world's greatest Coldfusion coder, but this problem is more difficult and more subtle than you'd imagine. The "replace the new URL" part is actually fairly easy, with Coldfusion's replace function. But first you need to successfully extract the original URL from a string of text, such as a twitter entry.

<!---
NOTE: this code assumes . . .

A) all valid links must start with http://
B) links have no gaps (e.g. http://www.exam ple.com/)
C) only one link per a text string, for now, please! We can add more later.
--->

<cfset tweet = "I should update my blog http://anerroroccurred.blogspot.com more frequently.">

<cfset startsAt = #FindNoCase("http://",tweet)#>

<cfif (#startsAt# IS 0)>
<!--- we can't find any URLs in this text, so do nothing --->
<cfelse>
<p>URL starts at: <cfoutput>#startsAt#</cfoutput></p>
<cfset endsAt = #FindNoCase(" ", tweet, #startsAt#)#>
<cfif endsAt IS 0>
<cfset endsAt = Len(tweet)>
</cfif>
<p>URL ends at: <cfoutput>#endsAt#</cfoutput></p>
<cfset theURLis = Mid(tweet, #startsAt#, (#endsAt# - #startsAt#))>
<cfoutput>And the URL is: #theURLis#</cfoutput>
</cfif>

Looks promising, right? That is, until you realize that people do all sorts of crazy things with URLs in text.

For example:

"My blog is at http://www.example.com. You should check it out!" (That period after the URL means trouble for our code above!)

"I have a blog entry (http://www.example.com/firstServed.htm) you might enjoy." (Great, now we have to anticipate parentheses!?)

Clearly, the find and mid functions alone aren't gonna be robust enough to handle our challenge. We need some serious black magic.

No comments: