-4

I need to make a regex that recognizes everything except text between quotes. Here is an example:

my_var == "Hello world!"

I want to get my_var but not Hello world!.

I tried (?<!\")([A-Za-z0-9]+) but it didn't work.

hwnd
  • 69,796
  • 4
  • 95
  • 132
lobodart
  • 216
  • 1
  • 11

4 Answers4

2

If you would of took the time to google or search stackoverflow, you would find answers to this question that have already been answered by not only me, but many other users out there.

@Pappa's answer using a negative lookbehind will only match a simple test case and not everything in a string that is not enclosed by quotes. I would suffice for a negative lookahead in this case, if you're wanting to match all word characters in any given data.

/[\w.-]+(?![^"]*"(?:(?:[^"]*"){2})*[^"]*$)/

See live demo

Example:

<?php

$text = <<<T
my_var == "Hello world!" foo /(^*#&^$ 
"hello" foobar "hello" FOO "hello" baz
Hi foo, I said "hello" $&@^$(@$)@$&*@(*$&
T;

preg_match_all('/[\w.-]+(?![^"]*"(?:(?:[^"]*"){2})*[^"]*$)/', $text, $matches);
print_r($matches);

Output

Array
(
     [0] => Array
        (
            [0] => my_var
            [1] => foo
            [2] => foobar
            [3] => FOO
            [4] => baz
            [5] => Hi
            [6] => foo
            [7] => I
            [8] => said
        )
)
Community
  • 1
  • 1
hwnd
  • 69,796
  • 4
  • 95
  • 132
  • Hi hwnd, the solution you posted is definitely better than the one I initially posted, but I just noticed that it fails to match as expected if there are an odd number of quotes. I suppose that's out of the scope of the question though. – Pappa Oct 22 '13 at 18:47
  • Yea, OP didn't state whether he had nested quotes or what not. That is a whole different ballgame. – hwnd Oct 22 '13 at 19:17
2

You have an accepted answer but I am still submitting once since I believe this answer is better in capturing more edge cases:

$s = 'my_var == "Hello world!" foo';
if (preg_match_all('/[\w.-]+(?=(?:(?:[^"]*"){2})*[^"]*$)/', $s, $arr))
   print_r($arr[0]);

OUTPUT:

Array
(
    [0] => my_var
    [1] => foo
)

This works by using a lookahead to make sure there are even # of double quotes are followed (requires balanced double quotes and no escaping).

anubhava
  • 761,203
  • 64
  • 569
  • 643
  • You were a minute too slow. =) Positive lookahead will work in this case too. – hwnd Oct 21 '13 at 22:02
  • I was waiting for someone to comment what if input is something like: `'my_var == "Hello it\"s a world!" foo'` :P – anubhava Oct 21 '13 at 22:04
  • 1
    Some users do not think outside the box on realistic cases before they ask a specific question on a topic. :) – hwnd Oct 21 '13 at 22:07
1

As much as I'll regret getting downvoted for answering this, I was intrigued, so did it anyway.

(?<![" a-zA-Z])([A-Za-z0-9\-_\.]+)
Pappa
  • 1,593
  • 14
  • 20
  • It works ! Thank you and sorry for my poor question :( It was the first one .. I swear I'll do better next time ! – lobodart Oct 21 '13 at 21:28
  • 1
    @Pappa, this works for the beginning of the line and if his test string is simple like that, but won't work for all cases. See here.. http://regex101.com/r/sP4rG6 – hwnd Oct 21 '13 at 21:31
  • Good point. I didn't really put too much effort into it, just a simple solution. There are some great answers on this page. – Pappa Oct 21 '13 at 22:30
0

This simple solution hasn't been mentioned (see demo):

"[^"]*"(*SKIP)(*F)|[\w.-]+

Reference

How to match pattern except in situations s1, s2, s3

Community
  • 1
  • 1
zx81
  • 41,100
  • 9
  • 89
  • 105