0

I have a pattern 'NewTree' and I want to get all strings that don't contain this pattern 'NewTree'. How do I use regex to do the filter?

So if I have 1.BoostKite 2.SetTree 3. ComeNewTreeNow

Then the output should be BoostKite and SetTree. Any suggestions? I wanted regex that can work anywhere and not use any language specific function.

vkaul11
  • 4,098
  • 12
  • 47
  • 79

3 Answers3

3

You can try using a Negative Lookahead if you want to use a regular expression.

^(?!.*NewTree).*$

Live Demo

Alternatively you can use the alternation operator in context placing what you want to exclude on the left, ( saying throw this away, it's garbage ) and place what you want to match in a capturing group on the right side.

\w*NewTree\w*|([a-zA-Z]+)

Live Demo

In Python:

( The strings being in list context, as you commented 'array' above )

>>> import re
>>> regex = re.compile(r'^(?!.*NewTree).*$')
>>> mylst = ['BoostKite', 'SetTree', 'ComeNewTree', 'NewTree']
>>> matches = [x for x in mylst if regex.match(x)]
['BoostKite', 'SetTree']

If it is just a long string of multiple words and you want to ignore the words that contain NewTree

>>> s = '1.BoostKite 2.SetTree 3. ComeNewTreeNow 4. foo 5. bar'
>>> filter(None, re.findall(r'\w*NewTree\w*|([a-zA-Z]+)', s))
['BoostKite', 'SetTree', 'foo', 'bar']

You can do this without a regular expression as well.

>>> mylst = ['BoostKite', 'SetTree', 'ComeNewTree', 'NewTree']
>>> matches = [x for x in mylst if "NewTree" not in x]
['BoostKite', 'SetTree']
hwnd
  • 69,796
  • 4
  • 95
  • 132
0

Match each word with the regex \w+NewTree\b. It returns true if it ends with NewTree

Use i modifier for case insensitive match (ignores case of [a-zA-Z])


Use \w* instead of \w+ in above regex if you want to match for NewTree word as well.

If you are looking for contains NewTree then try this regex \w*NewTree\w*\b

Braj
  • 46,415
  • 5
  • 60
  • 76
0

I think you can do this in general in the manner of the following example for your specific case:

^(([^N]|N[^e]|Ne[^w]|New[^T]|NewT[^r]|NewTr[^e]|NewTre[^e])+)?(.|..|...|....|.....)?$

So far what I have here is a near miss. It will not match any string that has substring NewTree. But it will not match every string that is free of the substring NewTree. In particular it will not match Nvwxyz.

ajm475du
  • 361
  • 1
  • 5
  • Modified to permit all words shorter than NewTree, intentionally avoiding the use of curly braces. – ajm475du Jul 10 '14 at 22:09
  • You say "insane." I say "literally what the question asks for" so I'm still working on this answer. There are plenty of "regex" languages that don't provide negative lookbehind etc. etc. – ajm475du Jul 10 '14 at 22:30
  • I do agree with you =) – hwnd Jul 10 '14 at 22:32