1

I am wondering what is the fastest, most efficient way (using JAVA) to search a large string and do a find replace such as:

find

'http://www.stackoverflow.com' 

within the body of a long string and replace it with

'<a href="http://www.stackoverflow.com">http://www.stackoverflow.com</a>' 

Now, before you suggest using XSL to do this it is already out of the question.

In a nut shell I would like to know how to find any instance of a URL within a long string and wrap it with the appropriate element so when the page renders on the web it will auto link. Thanks.

Steve McLeod
  • 51,737
  • 47
  • 128
  • 184
kevin sufferdini
  • 131
  • 3
  • 14

4 Answers4

2

Regular expressions to the rescue! Look at this question Regular expression to match URLs in Java

Just use the find and replace from Matcher instaed of just finding it as in the previous question.

For completeness sake here is some code that does what you want.

NOTE: Assuming you have an anchor tag with a URL already in the string being replaced, you cannot use REGEX and must parse the text as HTML and only look at text nodes before you run the regex replace.

Community
  • 1
  • 1
Daniel Moses
  • 5,872
  • 26
  • 39
  • @TedPrz I'm glad you got my reference. Sadly regex is not sufficient to do it perfectly, but it does a pretty darn good job. – Daniel Moses Jan 19 '12 at 19:09
1

I don't know about the most efficient (CPU cycle wise), but I would use RegEx'es. They are definitely the fastest and most efficient and cleanest from a programmatic perspective.

You can either use a Pattern and Matcher (see http://docs.oracle.com/javase/6/docs/api/java/util/regex/Pattern.html) or even easier is use the shortcut method String.replaceAll that's already part of the String object.

myString.replaceAll( "(<URL REGEX>)", "New String $1 Here" );

where $1 is replaced with whatever it matched to group #1 in the search string. You can also use online tools to test out the regex while you are coding it, such as http://www.fileformat.info/tool/regex.htm.

Depending on the type of regex matching that you need to do, you can try the following regex, or do a quick online search for a better one.

(http|ftp|https):\/\/[\w\-_]+(\.[\w\-_]+)+([\w\-\.,@?^=%&amp;:/~\+#]*[\w\-\@?^=%&amp;/~\+#])?

If you have never used regex'es before, once you start, you'll fall in love with them. The downside to regex's, of course, is that they are slower than a simple search or replace, but significantly more flexible.

Good luck.

Eric

Eric B.
  • 23,425
  • 50
  • 169
  • 316
0

Don't bother with regexes if you're hunting a literal string. Just use String.replace to do literal replacements.

Louis Wasserman
  • 191,574
  • 25
  • 345
  • 413
  • Agreed - much more efficient, but his question goes further and indicates that he is looking for a generic way to wrap all urls in html tags. – Eric B. Jan 19 '12 at 20:09
0

Of course, the more I think about this, the more I wonder if there isn't a better solution. I have separated this as a different answer as it is not a direct answer to your question, however, it is a potential solution to your problem.

Instead of parsing everything in Java, you can let the web browser itself do the autolinking. There are several JS libraries out there that perform this work already. You can likely easily code something in JQuery to accomplish this for you, or use existing plugins that others have written.

A quick Google search finds http://codesnipp.it/javascript/jquery-plugin-to-auto-link-urls as a JQ plugin.

Eric B.
  • 23,425
  • 50
  • 169
  • 316