0

I need to find a text with a regex, let's say "andres", but it must NOT be between []

for example if text is:

s = 'andres [andres andres] andres [andres] andresX andres' 

I should get the first, fourth, sixth and the last one, others have at least one [] so they do not match.

I tried this:

"[^\[]andres[^\]]"

a better example http://jsfiddle.net/aras7/5j3UM/8/

but it does not work.

Andres
  • 4,323
  • 7
  • 39
  • 53

4 Answers4

2

There is a useful pattern for doing this sort of thing in regex:

exclusion_context1|exclusion_context2|...|(stuff_you_want)

Where you can specify as many exclusion contexts as you want, and at the end capture the stuff you do want inside a capturing group. I could explain further but really I'll just link you to this answer which goes into great depth about the above pattern.

So, then:

\[.*?\]|(andres)

Regular expression visualization

Debuggex Demo

Where our exclusion context lazily matches anything inside brackets, and otherwise we capture all the andres outside of that context.

Since I just noticed you wanted the positions of the matches, it might look something like this in python:

for m in re.finditer(r'\[.*?\]|(andres)', s):
    if m.group(1):
        print('{}: {}'.format(m.start(),m.group()))

0: andres
23: andres
39: andres
47: andres
Community
  • 1
  • 1
roippi
  • 25,533
  • 4
  • 48
  • 73
  • the group at line 39 should write "andres" instead of "andresX" – Andres May 14 '14 at 19:03
  • oh. well, just.. put `andres` into the regex in place of the previous capturing group. Edited. – roippi May 14 '14 at 19:04
  • If if add "andres" it will only return (0,23 and 47), I would also like to put text with spaces like "andres arrieche" – Andres May 14 '14 at 19:07
  • forget my last comment, I thought you meant to put "\bandres\b" but now your answer is right – Andres May 14 '14 at 19:13
  • Ah, I see what you meant then. – roippi May 14 '14 at 19:13
  • I create an example which it fails. I'll edit my question to add it. I'll also remove python tag because my example is javascript – Andres May 14 '14 at 19:22
  • It's failing because you're replacing things matched by the main expression as well as the capturing group - you **only** want to replace things matched by the capturing group. – roippi May 14 '14 at 19:27
  • for example here http://jsfiddle.net/aras7/5j3UM/10/ it "works" but does not change the last – Andres May 14 '14 at 19:35
1

Try this: test string is:

$string = 'andres [an1dres an1dres] andres [an1dres] andresX andres' ;

$patern = '/\\[.*?\\]| /';
volkinc
  • 2,143
  • 1
  • 15
  • 19
  • I need to specify a particular string, in the example I used my name but it can be whatever – Andres May 14 '14 at 18:42
1

You can use the following:

\w+(?![^\[]*\])

Regular expression visualization

Demo

Ulugbek Umirov
  • 12,719
  • 3
  • 23
  • 31
0

Try:

andres(?=[^[\]]+(?:\[|$))
  • You may or may not want to set the case insensitive option.
  • You may or may not need to set the ^$ match at line break option.

Explanation

look for any instance of the string which is followed by a

  • Series of characters that is not a closing bracket and then by
  • either an opening bracket or the end of the string.

Explanation from RegexBuddy:

andres not between [ ... ]

andres(?=[^[\]]+(?:\[|$))

Options: Case sensitive; ^$ match at line breaks

Match the character string “andres” literally (case sensitive) «andres»
Assert that the regex below can be matched, starting at this position (positive lookahead) «(?=[^[\]]+(?:\[|$))»
   Match any single character NOT present in the list below «[^[\]]+»
      Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
      The literal character “[” «[»
      The literal character “]” «\]»
   Match the regular expression below «(?:\[|$)»
      Match this alternative (attempting the next alternative only if this one fails) «\[»
         Match the character “[” literally «\[»
      Or match this alternative (the entire group fails if this one fails to match) «$»
         Assert position at the end of a line (at the end of the string or before a line break character) (line feed, line feed, line separator, paragraph separator) «$»

Created with RegexBuddy
Ron Rosenfeld
  • 53,870
  • 7
  • 28
  • 60