0

I need to get the link out of a bunch of HTML and im using patterns for that. The problem is that the pattern includes the text before and after (.*?). Should it do that? I thought it only includes the text between boundaries.

Ive modified the code a little bit and now it only includes the quote.

Pattern p = Pattern.compile("http://cdn.posh24.se/images/:profile(.*?)");
Matcher m = p.matcher(splitStrings[0]);;

[http://cdn.posh24.se/images/:profile/088484075fb5b4418f5cb8814728decab",... that is the output, this is the expected: [http://cdn.posh24.se/images/:profile/088484075fb5b4418f5cb8814728decab

2 Answers2

2

You can do something like this:

Pattern p = Pattern.compile("http://cdn.posh24.se/images/:profile(.*?)(?=\")");

This sequence is called Positive Look Ahead. You can find a good explanation here.

Saeed Entezari
  • 3,685
  • 2
  • 19
  • 40
0
Pattern p =  Pattern.compile("http://cdn.posh24.se/images/:profile([^\"]*)");
Matcher m = p.matcher(splitStrings[0]);

while (m.find()) {
    System.out.println(m.group(0));
}
Jason Aller
  • 3,541
  • 28
  • 38
  • 38
Marc G. Smith
  • 876
  • 6
  • 8