15

UPDATE: Thanks for all the great responses! I tried many different regex patterns but didn't understand why m.matches() was not doing what I think it should be doing. When I switched to m.find() instead, as well as adjusting the regex pattern, I was able to get somewhere.


I'd like to match a pattern in a Java string and then extract the portion matched using a regex (like Perl's $& operator).

This is my source string "s": DTSTART;TZID=America/Mexico_City:20121125T153000 I want to extract the portion "America/Mexico_City".

I thought I could use Pattern and Matcher and then extract using m.group() but it's not working as I expected. I've tried monkeying with different regex strings and the only thing that seems to hit on m.matches() is ".*TZID.*" which is pointless as it just returns the whole string. Could someone enlighten me?

 Pattern p = Pattern.compile ("TZID*:"); // <- change to "TZID=([^:]*):"
 Matcher m = p.matcher (s);
 if (m.matches ()) // <- change to m.find()
    Log.d (TAG, "looking at " + m.group ()); // <- change to m.group(1)
nhahtdh
  • 55,989
  • 15
  • 126
  • 162
wufoo
  • 13,571
  • 12
  • 53
  • 78
  • Looks like a line from an ics (iCal) file - why don't you use http://ical4j.sourceforge.net/ or equivalent? – jrtc27 Dec 07 '12 at 17:09
  • Indeed. I started with ical4j but it hurled with an error when parsing the ics file so ditched it. I may try it again if I need more functionality than just extracting the DTSTART lines. – wufoo Dec 14 '12 at 18:06

5 Answers5

16

You use m.match() that tries to match the whole string, if you will use m.find(), it will search for the match inside, also I improved a bit your regexp to exclude TZID prefix using zero-width look behind:

     Pattern p = Pattern.compile("(?<=TZID=)[^:]+"); //
     Matcher m = p.matcher ("DTSTART;TZID=America/Mexico_City:20121125T153000");
     if (m.find()) {
         System.out.println(m.group());
     }
hoaz
  • 9,883
  • 4
  • 42
  • 53
6

This should work nicely:

Pattern p = Pattern.compile("TZID=(.*?):");
Matcher m = p.matcher(s);
if (m.find()) {
    String zone = m.group(1); // group count is 1-based
    . . .
}

An alternative regex is "TZID=([^:]*)". I'm not sure which is faster.

Ted Hopp
  • 232,168
  • 48
  • 399
  • 521
2

You are missing a dot before the asterisk. Your expression will match any number of uppercase Ds.

Pattern p = Pattern.compile ("TZID[^:]*:");

You should also add a capturing group unless you want to capture everything, including the "TZID" and the ":"

Pattern p = Pattern.compile ("TZID=([^:]*):");

Finally, you should use the right API to search the string, rather than attempting to match the string in its entirety.

Pattern p = Pattern.compile("TZID=([^:]*):");
Matcher m = p.matcher("DTSTART;TZID=America/Mexico_City:20121125T153000");
if (m.find()) {
    System.out.println(m.group(1));
}

This prints

America/Mexico_City
Sergey Kalinichenko
  • 714,442
  • 84
  • 1,110
  • 1,523
2

You are using the wrong pattern, try this:

Pattern p = Pattern.compile(".*?TZID=([^:]+):.*");
Matcher m = p.matcher (s);
if (m.matches ())
    Log.d (TAG, "looking at " + m.group(1));

.*? will match anything in the beginning up to TZID=, then TZID= will match and a group will begin and match everything up to :, the group will close here and then : will match and .* will match the rest of the String, now you can get what you need in group(1)

Polyana Fontes
  • 3,156
  • 1
  • 27
  • 41
1

Why not simply use split as:

  String origStr = "DTSTART;TZID=America/Mexico_City:20121125T153000";
  String str = origStr.split(":")[0].split("=")[1];
Yogendra Singh
  • 33,927
  • 6
  • 63
  • 73