2

I'm trying to parse a string for any occurrences of markdown style links, i.e. [text](link). I'm able to get the first of the links in a string, but if I have multiple links I can't access the rest. Here is what I've tried, you can run it on ideone:

Pattern p;
try {
    p = Pattern.compile("[^\\[]*\\[(?<text>[^\\]]*)\\]\\((?<link>[^\\)]*)\\)(?:.*)");
} catch (PatternSyntaxException ex) {
    System.out.println(ex);
    throw(ex);
}
Matcher m1 = p.matcher("Hello");
Matcher m2 = p.matcher("Hello [world](ladies)");
Matcher m3 = p.matcher("Well, [this](that) has [two](too many) keys.");
System.out.println("m1 matches: " + m1.matches());  // false
System.out.println("m2 matches: " + m2.matches());  // true
System.out.println("m3 matches: " + m3.matches());  // true
System.out.println("m2 text: " + m2.group("text")); // world
System.out.println("m2 link: " + m2.group("link")); // ladies
System.out.println("m3 text: " + m3.group("text")); // this
System.out.println("m3 link: " + m3.group("link")); // that
System.out.println("m3 end: " + m3.end());          // 44 - I want 18
System.out.println("m3 count: " + m3.groupCount()); // 2 - I want 4
System.out.println("m3 find: " + m3.find());        // false - I want true

I know I can't have repeating groups, but I figured find would work, however it does not work as I expected it to. How can I modify my approach so that I can parse each link?

Community
  • 1
  • 1
2rs2ts
  • 10,662
  • 10
  • 51
  • 95

2 Answers2

1

Can't you go through the matches one by one and do the next match from an index after the previous match? You can use this regex:

\[(?<text>[^\]]*)\]\((?<link>[^\)]*)\)

The method Find() tries to find all matches even if the match is a substring of the entire string. Each call to find gets the next match. Matches() tries to match the entire string and fails if it doesn't match. Use something like this:

while (m.find()) {
    String s = m.group(1);
    // s now contains "BAR"
}
Farhad Alizadeh Noori
  • 2,276
  • 17
  • 22
  • I tried a similar syntax, but `m3.matches()` was `false`, probably because of the trailing characters which aren't `)`. Any suggestions on how I can work around that? – 2rs2ts May 14 '14 at 14:51
  • I guess the issue with your above pattern is that (?:.*) causes the match to continue all the way to the end of the string. So you pass all other prospective matches in the way. I would use a pattern like I suggested and get all the matches from the string. – Farhad Alizadeh Noori May 14 '14 at 15:01
  • It's strange that matches is false. Find does give the results with the pattern used in answer. Test it here : http://java-regex-tester.appspot.com/ – Farhad Alizadeh Noori May 14 '14 at 15:04
  • `.matches()` is `false` with `"\\[(?[^\\]]*)\\]\\((?[^\\)]*)\\)"`. But `.find()` is `true`. I think I understand what I have to do but could you write up what to do in order to get each pair of groups so I can accept your answer? – 2rs2ts May 14 '14 at 15:34
0

The regular expression I've used to match what you need (without groups) is \[\w+\]\(.+\)

It is just to show you it simple. Basically it does:

  • Filter a square: \[
  • Followed by any word char (at least 1): \w+
  • Then the square: \]

This will look for these pattern [blabla]

Then the same with parenthesis...

  • Filter a parenthesis: \(
  • Followed by any char (at least 1): .+
  • Then the parenthesis: \)

So it filters (ble...ble...)

Now if you want to store the matches on groups you can use additional parenthesis like this:

(\[\w+\])(\(.+\)) in this way you can have stored the words and links.

Hope to help.

I've tried on regexplanet.com and it's working

Update: workaround .*(\[\w+\])(\(.+\))*.*

Federico Piazza
  • 30,085
  • 15
  • 87
  • 123