1

Given this PCRE pattern:

/(<name>[^<>]*<\/name>[^<>]*<phone>[^<>]*<\/phone>)/

And this subject text:

<name>John Stevens</name>  <phone>888-555-1212</phone>
<name>Peter Wilson</name>  
<phone>888-555-2424</phone>

How can I get the Regular Expression to match the first name-phone pair but not the second? I don't want to match pairs that are separated by line breaks. I tried including an end-of-line in the negated character class like so [^<>$]* but nothing changed.

You can use the following online tools to test your expressions:
http://rubular.com/
http://www.regextester.com/
Thank you.

Andrew
  • 8,363
  • 8
  • 43
  • 71
  • 1
    Inside a character class, the `$` loses its special meaning and becomes simply a literal dollar sign. What you want is: `[^<>\r\n]` as sawa suggests. – ridgerunner Apr 24 '11 at 04:14

3 Answers3

4

I think this will do it

/<name>[^<>]*<\/name>[^<>\r\n]*<phone>[^<>]*<\/phone>/

Whatever you put in the class [ ] must be something that represents a single character. $ is interpreted as literal $ within a class, probably because $ as line end is 0-width, and could not be interpreted as such within a class. (Edited after comment by ridgerunner)

By the way, I took off the parentheses that surrounds your regex because whatever matches it can be referred to as the whole match.

sawa
  • 165,429
  • 45
  • 277
  • 381
1

If you don't want to match pairs separated by line breaks then following regex will do the job:

/(<name>[^<>]*<\/name>.*?<phone>[^<>]*<\/phone>)/

Matches only first name, phone pair since dot . will not match EOL but [^<>] will match it.

Tested it on http://rubular.com/r/amXvq20sl8

anubhava
  • 761,203
  • 64
  • 569
  • 643
  • Thank you. But I also needed to exclude `<>` to prevent capturing other tags. – Andrew Apr 24 '11 at 04:56
  • It wouldn't really hurt to make it `[^<>]*` above, however I think once we are already inside `` then to capture everything up to `' we just need `[<]*` – anubhava Apr 24 '11 at 05:06
  • Right, and I like that change. What I omitted from the subject text is that there could be other tags between name and phone that I don't want to capture if they're there. ie `MarkBill888...`. The `.*` would capture both names on that same line. I know I could make it lazy instead of greedy, but that could negatively affect other parts of my pattern. I think the `\r\n` as stated above will work for me. With the addition of your change: `[^<\r\n]`. – Andrew Apr 24 '11 at 13:54
0

Those sites don't seem to support the whole PCRE syntax. I used this site: http://lumadis.be/regex/test_regex.php

And this worked:

/^(<name>[^<>]*<\/name>[^<>$]*<phone>[^<>]*<\/phone>)/

/(?-s)(<name>[^<>]*<\/name>.*<phone>[^<>]*<\/phone>)/

is probably better

Christo
  • 8,729
  • 2
  • 22
  • 16