13

I have a string something like this

"quick" "brown" fox jumps "over" "the" lazy dog

I need a regex to detect words not enclosed in double quotes. After some random tries I found this ("([^"]+)"). This detects a string enclosed in double quotes. But I want the opposite. I really can't come up with it even after trying to reverse the above mentioned regex. I am quite weak in regex. Please help me

Shades88
  • 7,934
  • 22
  • 88
  • 130

3 Answers3

33

Use lookahead/lookbehind assertions:

(?<![\S"])([^"\s]+)(?![\S"])

Example:

>>> import re
>>> a='"quick" "brown" fox jumps "over" "the" lazy dog'
>>> print re.findall('(?<![\S"])([^"\s]+)(?![\S"])',a)
['fox', 'jumps', 'lazy', 'dog']

The main thing here is lookahead/lookbehind assertions. You can say: I want this symbol before the expression but I don't want it to be a part of the match itself. Ok. For that you use assertions:

(?<![\S"])abc

That is a negative lookbehind. That means you want abc but without [\S"] before it, that means there must be no non-space character (beginning of the word) or " before.

That is the same but in the other direction:

abc(?![\S"])

That is a negative lookahead. That means you want abc but without [\S"] after it.

There are four differenet assertions of the type in general:

(?=pattern)
    is a positive look-ahead assertion
(?!pattern)
    is a negative look-ahead assertion
(?<=pattern)
    is a positive look-behind assertion
(?<!pattern)
    is a negative look-behind assertion 
Igor Chubin
  • 61,765
  • 13
  • 122
  • 144
  • 1
    thanks a lot, that worked like magic :) Just one more favor, can you explain it a bit? Looks quite a bit complex – Shades88 Jul 04 '12 at 08:11
  • 1
    c'mon...you know their type as `lookahead/lookbehind assertions`. you can definitely explain me how that is working !! – Shades88 Jul 04 '12 at 08:18
  • 1
    does Python supports an Negation of regexps like `findAllExcept(/pattern/)`? – gaussblurinc Jul 04 '12 at 10:04
  • 1
    @loldop: what do you mean, "find all except"? you can use the same re but get not a list of strings but a list of indexes (something like: (a,b), (c,d) and so on) and then (0,a), (b,c), (d,-1) is what youa re looking for. right? – Igor Chubin Jul 04 '12 at 10:21
  • 1
    i mean, if i have simple condition, i will find all words, that DON'T match this condition – gaussblurinc Jul 04 '12 at 10:41
  • @IgorChubin , Can we improve this to exclude singlequoted strings as well?. I tried (?<![\\S\"[\\S']])([^\"\\s]+)(?![\\S\"[\\S']]) in vain. – NishM Jun 20 '16 at 22:48
  • @NishM: Why double brackets? Just write ["\'] – Igor Chubin Jun 21 '16 at 10:03
  • The double quote seems redundant in `[\S"]` since non-space characters include `"`. This seems to do the same `(?<=\s)([^"\s]+)(?=\s)`. – Scratte Sep 16 '21 at 17:01
0

use this regex:

\s+(?<myword>([^\"\s]+)*)\s+

this should be work; and get group named myword. else you need to trim your result string.

Ria
  • 10,237
  • 3
  • 33
  • 60
-3

Remove the first quote from the string

Vilius Gaidelis
  • 430
  • 5
  • 14