Pattern/Matcher group() to obtain substring in Java?

Question

UPDATE: Thanks for all the great responses! I tried many different regex patterns but didn't understand why m.matches() was not doing what I think it should be doing. When I switched to m.find() instead, as well as adjusting the regex pattern, I was able to get somewhere.

I'd like to match a pattern in a Java string and then extract the portion matched using a regex (like Perl's $& operator).

This is my source string "s": DTSTART;TZID=America/Mexico_City:20121125T153000 I want to extract the portion "America/Mexico_City".

I thought I could use Pattern and Matcher and then extract using m.group() but it's not working as I expected. I've tried monkeying with different regex strings and the only thing that seems to hit on m.matches() is ".*TZID.*" which is pointless as it just returns the whole string. Could someone enlighten me?

 Pattern p = Pattern.compile ("TZID*:"); // <- change to "TZID=([^:]*):"
 Matcher m = p.matcher (s);
 if (m.matches ()) // <- change to m.find()
    Log.d (TAG, "looking at " + m.group ()); // <- change to m.group(1)

Looks like a line from an ics (iCal) file - why don't you use http://ical4j.sourceforge.net/ or equivalent? — jrtc27, Dec 07 '12 at 17:09
Indeed. I started with ical4j but it hurled with an error when parsing the ics file so ditched it. I may try it again if I need more functionality than just extracting the DTSTART lines. — wufoo, Dec 14 '12 at 18:06

score 16 · Accepted Answer · answered Dec 07 '12 at 17:21

You use m.match() that tries to match the whole string, if you will use m.find(), it will search for the match inside, also I improved a bit your regexp to exclude TZID prefix using zero-width look behind:

     Pattern p = Pattern.compile("(?<=TZID=)[^:]+"); //
     Matcher m = p.matcher ("DTSTART;TZID=America/Mexico_City:20121125T153000");
     if (m.find()) {
         System.out.println(m.group());
     }

score 6 · Answer 2 · answered Dec 07 '12 at 17:11

6

This should work nicely:

Pattern p = Pattern.compile("TZID=(.*?):");
Matcher m = p.matcher(s);
if (m.find()) {
    String zone = m.group(1); // group count is 1-based
    . . .
}

An alternative regex is "TZID=([^:]*)". I'm not sure which is faster.

answered Dec 07 '12 at 17:11

Ted Hopp

232,168
48
399
521

Sergey Kalinichenko · Answer 3 · 2012-12-07T17:11:50.760

You are missing a dot before the asterisk. Your expression will match any number of uppercase Ds.

Pattern p = Pattern.compile ("TZID[^:]*:");

You should also add a capturing group unless you want to capture everything, including the "TZID" and the ":"

Pattern p = Pattern.compile ("TZID=([^:]*):");

Finally, you should use the right API to search the string, rather than attempting to match the string in its entirety.

Pattern p = Pattern.compile("TZID=([^:]*):");
Matcher m = p.matcher("DTSTART;TZID=America/Mexico_City:20121125T153000");
if (m.find()) {
    System.out.println(m.group(1));
}

This prints

America/Mexico_City

score 2 · Answer 4 · answered Dec 07 '12 at 17:09

2

You are using the wrong pattern, try this:

Pattern p = Pattern.compile(".*?TZID=([^:]+):.*");
Matcher m = p.matcher (s);
if (m.matches ())
    Log.d (TAG, "looking at " + m.group(1));

.*? will match anything in the beginning up to TZID=, then TZID= will match and a group will begin and match everything up to :, the group will close here and then : will match and .* will match the rest of the String, now you can get what you need in group(1)

answered Dec 07 '12 at 17:09

Polyana Fontes

3,156
1
27
41

I deleted my nearly identical answer in favor of yours. You had a slightly better regex. – mikeslattery Dec 07 '12 at 17:11

score 1 · Answer 5 · answered Dec 07 '12 at 17:14

1

Why not simply use split as:

  String origStr = "DTSTART;TZID=America/Mexico_City:20121125T153000";
  String str = origStr.split(":")[0].split("=")[1];

answered Dec 07 '12 at 17:14

Yogendra Singh

33,927
6
63
73

Very elegant solution. Thanks – wufoo Dec 07 '12 at 20:29

Pattern/Matcher group() to obtain substring in Java?

5 Answers5

Linked