0

I made a regular expression: https://regex101.com/r/ToCwrE/2/

All it should do, is get out the function's parameters. I am trying with capture groups to achieve this.

[\s]*javascript:[\s]*m\((-?\d+)[\s]*,[\s]*(-?\d+)[\s]*,[\s]{0,}encodeURIComponent\(\'([^\']+)*\'\)[\s]*,[\s]*(-?\d+)\)[\s]*

Tried it on:

javascript:m(53009,2,encodeURIComponent('7711T'), 22)
javascript:m(52992,2,encodeURIComponent('3013'), 2)
javascript:m(10440,2,encodeURIComponent('F Series'), 11)
javascript:m(53022,2,encodeURIComponent('C 12045'), 85)
javascript:m(53045,2,encodeURIComponent('Prox 8441'), 16)
javascript:m(26016,2,encodeURIComponent('Vard   asd .ious'), 22)

Using the site regex101 and a few similar ones, it correctly returns the matched groups. However when I am trying to use it in Java, it simply won't match the capture groups and only returns the whole text.

If I copy paste it with IDEA, It automatically gets escaped (replaces \ to \):

Pattern pattern = Pattern.compile("[\\s]*javascript:[\\s]*m\\((-?\\d+)[\\s]*,[\\s]*(-?\\d+)[\\s]*,[\\s]{0,}encodeURIComponent\\(\\'([^\\']+)*\\'\\)[\\s]*,[\\s]*(-?\\d+)\\)[\\s]*");
Matcher m = pattern.matcher("javascript:m(53022,2,encodeURIComponent('Cr 12045'), 85)");
List<String> groups = new ArrayList<>();
while (m.find()) {
    groups.add(m.group());
}
groups;

enter image description here

What am I missing? How should the regex be converted to get it working in Java?

szab.kel
  • 2,356
  • 5
  • 40
  • 74

2 Answers2

2

The regex is matching correctly, it's just how you reference each group using m.group(). The following should help:

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class JavaTest {

    public static void main(String[] args) {

        Pattern pattern = Pattern.compile("[\\s]*javascript:[\\s]*m\\((-?\\d+)[\\s]*,[\\s]*(-?\\d+)[\\s]*,[\\s]{0,}encodeURIComponent\\(\\'([^\\']+)*\\'\\)[\\s]*,[\\s]*(-?\\d+)\\)[\\s]*");
        Matcher m = pattern.matcher("javascript:m(53009,2,encodeURIComponent('7711T'), 22)");
        if (m.find()) {
            for (int i=1 ; i <= m.groupCount() ; i++) {
                System.out.println(m.group(i));
            }
        }
    }
}

Provides the output:

53009
2
7711T
22
65Roadster
  • 220
  • 1
  • 8
2

To get content of each group you can use Matcher#group(number) or Matcher#group(name). In your case to get content of first group use m.group(1) and you will get 53022.

Problem with m.group() is that it is same as m.group(0) so it returns content of group 0, which holds match for whole pattern.

To iterate over all groups use simple for loop. To dynamically get amounts of groups in pattern use Matcher#groupCount.

So to put results from all groups you can use

Pattern p = Pattern.compile("[\\s]*javascript:[\\s]*m\\((-?\\d+)[\\s]*,[\\s]*(-?\\d+)[\\s]*,[\\s]{0,}encodeURIComponent\\(\\'([^\\']+)*\\'\\)[\\s]*,[\\s]*(-?\\d+)\\)[\\s]*");
Matcher m = p.matcher("javascript:m(53022,2,encodeURIComponent('Cr 12045'), 85)");
List<String> groups = new ArrayList<>();
while (m.find()) {
    for (int i=1; i<=m.groupCount(); i++){
        groups.add(m.group(i));
    }
}

System.out.println(groups); //[53022, 2, Cr 12045, 85]

BTW

  • \s is already character class so it doesn't need to be nested in [..], so instead of [\\s]* you can write \\s*.
  • {0,} is same as * so I don't see any reason to mix those two, use * everywhere
  • ' is not regex metacharacter so it doesn't need escaping
Pshemo
  • 122,468
  • 25
  • 185
  • 269
  • Thanks for the regex tips, even though I know them and I just played around too much with the regex, because I thought it was the cause :( – szab.kel Jul 06 '17 at 06:21