1

In my application I need get the link and break it if it is bigger than 10 (example) characters. The problem is, if I send the whole text, for example: "this is my website www.stackoverflow.com" directly to this matcher

Pattern patt = Pattern.compile("(?i)\\b((?:https?://|www\\d{0,3}[.]|[a-z0-9.\\-]+[.][a-z]{2,4}/)(?:[^\\s()<>]+|\\(([^\\s()<>]+|(\\([^\\s()<>]+\\)))*\\))+(?:\\(([^\\s()<>]+|(\\([^\\s()<>]+\\)))*\\)|[^\\s`!()\\[\\]{};:\'\".,<>???“”‘’]))");
Matcher matcher = patt.matcher(text);  



matcher.replaceAll("<a href=\"http://$1\" target=\"_blank\">$1</a>");

It would show the whole website, without breaking it.

What I was trying to do, is to get the value of $1, so I could break the second one, keeping the first one correctly.

I've got another method to break the string up.

What I want to get is only the website so I could break it after all.

VLAZ
  • 26,331
  • 9
  • 49
  • 67
Gondim
  • 3,038
  • 8
  • 44
  • 62
  • 4
    Nastiest regex I've ever seen. – toto2 Jun 09 '11 at 18:12
  • 3
    I am not sure I can understand what's your goal. Could you post the example input and expected output for it? – Grzegorz Oledzki Jun 09 '11 at 18:28
  • It is almost certainly easier and definitely more readable if you simply output the http part you extract from the string to a variable and then breaking that variable into 10 char pieces through a separate function than to keep adding to that regex and have it done in one line, if it's even possible. In fact, I can't believe you're seriously contemplating how to **add** to that pattern. – NorthGuard Jun 09 '11 at 19:42

2 Answers2

2

You can't use replaceAll; you should iterate through the matches and process each one individually. Java's Matcher already has an API for this:

 // expanding on the example in the 'appendReplacement' JavaDoc:
 Pattern p = Pattern.compile("..."); // your URL regexp
 Matcher m = p.matcher(text);
 StringBuffer sb = new StringBuffer();
 while (m.find()) {
     String truncatedURL = m.group(1).replaceFirst("^(.{10}).*","$1..."); // i iz smrt
     m.appendReplacement(sb,
         "<a href=\"http://$1\" target=\"_blank\">"); // simple replacement for $1
     sb.append(truncatedURL);
     sb.append("</a>");
 }
 m.appendTail(sb);
 System.out.println(sb.toString());

(For performance, you should factor out compiled Patterns for the replace* calls inside the loop.)

Edit: use sb.append() so not to worry about escaping $ and \ in 'truncatedURL'.

Peter Davis
  • 761
  • 7
  • 13
0

I think that you have a similar problem to the one mentioned on this question

Java : replacing text URL with clickable HTML link

they suggested something like this

String basicUrlRegex =  "(.*://[^<>[:space:]]+[[:alnum:]/])"; 
myString.replaceAll(basicUrlRegex, "<a href=\"$1\">$1</a>");
Community
  • 1
  • 1