How to exclude a line break from regex character class?

Question

Given this PCRE pattern:

/(<name>[^<>]*<\/name>[^<>]*<phone>[^<>]*<\/phone>)/

And this subject text:

<name>John Stevens</name>  <phone>888-555-1212</phone>
<name>Peter Wilson</name>  
<phone>888-555-2424</phone>

How can I get the Regular Expression to match the first name-phone pair but not the second? I don't want to match pairs that are separated by line breaks. I tried including an end-of-line in the negated character class like so [^<>$]* but nothing changed.

You can use the following online tools to test your expressions:
http://rubular.com/
http://www.regextester.com/
Thank you.

Inside a character class, the `$` loses its special meaning and becomes simply a literal dollar sign. What you want is: `[^<>\r\n]` as sawa suggests. — ridgerunner, Apr 24 '11 at 04:14

sawa · Accepted Answer · 2011-04-24T04:29:42.313

4

I think this will do it

/<name>[^<>]*<\/name>[^<>\r\n]*<phone>[^<>]*<\/phone>/

Whatever you put in the class [ ] must be something that represents a single character. $ is interpreted as literal $ within a class, probably because $ as line end is 0-width, and could not be interpreted as such within a class. (Edited after comment by ridgerunner)

By the way, I took off the parentheses that surrounds your regex because whatever matches it can be referred to as the whole match.

edited Apr 24 '11 at 04:29

answered Apr 24 '11 at 03:58

sawa

165,429
45
277
381

1

+1 (but the `$` does have an effect inside a char class - it matches a dollar sign.) – ridgerunner Apr 24 '11 at 04:17
@ridgerunner Thanks for pointing out. I will correct my answer. – sawa Apr 24 '11 at 04:18
I also added `\r` as pointed out by ridgerunner. I only had unix in mind. – sawa Apr 24 '11 at 04:31

anubhava · Answer 2 · 2011-04-24T14:58:38.317

1

If you don't want to match pairs separated by line breaks then following regex will do the job:

/(<name>[^<>]*<\/name>.*?<phone>[^<>]*<\/phone>)/

Matches only first name, phone pair since dot . will not match EOL but [^<>] will match it.

Tested it on http://rubular.com/r/amXvq20sl8

edited Apr 24 '11 at 14:58

answered Apr 24 '11 at 04:19

anubhava

761,203
64
569
643

Thank you. But I also needed to exclude `<>` to prevent capturing other tags. – Andrew Apr 24 '11 at 04:56
It wouldn't really hurt to make it `[^<>]*` above, however I think once we are already inside `` then to capture everything up to `' we just need `[<]*` – anubhava Apr 24 '11 at 05:06
Right, and I like that change. What I omitted from the subject text is that there could be other tags between name and phone that I don't want to capture if they're there. ie `MarkBill888...`. The `.*` would capture both names on that same line. I know I could make it lazy instead of greedy, but that could negatively affect other parts of my pattern. I think the `\r\n` as stated above will work for me. With the addition of your change: `[^<\r\n]`. – Andrew Apr 24 '11 at 13:54

score 0 · Answer 3 · answered Apr 24 '11 at 04:27

Those sites don't seem to support the whole PCRE syntax. I used this site: http://lumadis.be/regex/test_regex.php

And this worked:

/^(<name>[^<>]*<\/name>[^<>$]*<phone>[^<>]*<\/phone>)/

/(?-s)(<name>[^<>]*<\/name>.*<phone>[^<>]*<\/phone>)/

is probably better

How to exclude a line break from regex character class?

3 Answers3

Linked