Java regex to parse any number of Markdown-style links

Question

I'm trying to parse a string for any occurrences of markdown style links, i.e. [text](link). I'm able to get the first of the links in a string, but if I have multiple links I can't access the rest. Here is what I've tried, you can run it on ideone:

Pattern p;
try {
    p = Pattern.compile("[^\\[]*\\[(?<text>[^\\]]*)\\]\\((?<link>[^\\)]*)\\)(?:.*)");
} catch (PatternSyntaxException ex) {
    System.out.println(ex);
    throw(ex);
}
Matcher m1 = p.matcher("Hello");
Matcher m2 = p.matcher("Hello [world](ladies)");
Matcher m3 = p.matcher("Well, [this](that) has [two](too many) keys.");
System.out.println("m1 matches: " + m1.matches());  // false
System.out.println("m2 matches: " + m2.matches());  // true
System.out.println("m3 matches: " + m3.matches());  // true
System.out.println("m2 text: " + m2.group("text")); // world
System.out.println("m2 link: " + m2.group("link")); // ladies
System.out.println("m3 text: " + m3.group("text")); // this
System.out.println("m3 link: " + m3.group("link")); // that
System.out.println("m3 end: " + m3.end());          // 44 - I want 18
System.out.println("m3 count: " + m3.groupCount()); // 2 - I want 4
System.out.println("m3 find: " + m3.find());        // false - I want true

I know I can't have repeating groups, but I figured find would work, however it does not work as I expected it to. How can I modify my approach so that I can parse each link?

Farhad Alizadeh Noori · Accepted Answer · 2014-05-14T15:55:30.377

1

Can't you go through the matches one by one and do the next match from an index after the previous match? You can use this regex:

\[(?<text>[^\]]*)\]\((?<link>[^\)]*)\)

The method Find() tries to find all matches even if the match is a substring of the entire string. Each call to find gets the next match. Matches() tries to match the entire string and fails if it doesn't match. Use something like this:

while (m.find()) {
    String s = m.group(1);
    // s now contains "BAR"
}

edited May 14 '14 at 15:55

answered May 14 '14 at 14:47

Farhad Alizadeh Noori

2,276
17
22

I tried a similar syntax, but `m3.matches()` was `false`, probably because of the trailing characters which aren't `)`. Any suggestions on how I can work around that? – 2rs2ts May 14 '14 at 14:51
I guess the issue with your above pattern is that (?:.*) causes the match to continue all the way to the end of the string. So you pass all other prospective matches in the way. I would use a pattern like I suggested and get all the matches from the string. – Farhad Alizadeh Noori May 14 '14 at 15:01
It's strange that matches is false. Find does give the results with the pattern used in answer. Test it here : http://java-regex-tester.appspot.com/ – Farhad Alizadeh Noori May 14 '14 at 15:04
`.matches()` is `false` with `"\\[(?[^\\]]*)\\]\\((?[^\\)]*)\\)"`. But `.find()` is `true`. I think I understand what I have to do but could you write up what to do in order to get each pair of groups so I can accept your answer? – 2rs2ts May 14 '14 at 15:34

Federico Piazza · Answer 2 · 2014-05-14T15:41:32.720

0

The regular expression I've used to match what you need (without groups) is \[\w+\]\(.+\)

It is just to show you it simple. Basically it does:

Filter a square: \[
Followed by any word char (at least 1): \w+
Then the square: \]

This will look for these pattern [blabla]

Then the same with parenthesis...

Filter a parenthesis: \(
Followed by any char (at least 1): .+
Then the parenthesis: \)

So it filters (ble...ble...)

Now if you want to store the matches on groups you can use additional parenthesis like this:

(\[\w+\])(\(.+\)) in this way you can have stored the words and links.

Hope to help.

I've tried on regexplanet.com and it's working

Update: workaround .*(\[\w+\])(\(.+\))*.*

edited May 14 '14 at 15:41

answered May 14 '14 at 15:09

Federico Piazza

30,085
15
87
123

This suffers from the same problem that I had with Farhad's answer. – 2rs2ts May 14 '14 at 15:27
Can you try this workaround, I've update the post at the end since can't post it here – Federico Piazza May 14 '14 at 15:39
Right, but my problem is that I can only find the first `default` and first `key` groups. – 2rs2ts May 14 '14 at 15:47

Java regex to parse any number of Markdown-style links

2 Answers2

Linked