Regex: How to match the complement of a pattern

Question

I am getting strings from text files that contain newline characters (\N in this case) and other substrings that I don't want to keep. In the case of a newline character, I can use...

re.search('\\\\N', string)

To match them, but I'd like to know how to match the rest of the string. As I said, I need to do it with other substrings. I tried doing...

re.search('^\\\\N', string)

But this returned no match. I guess it actually tried to match an 'N' that's preceded by an '\', which in turn is preceded by any character other than a '\'.

How can I match anything that doesn't match the regex I'm passing?

Regex allows you do perform negative pattern matching (i.e. match when the pattern is not present). However, it's not clear what pattern you don't want to match. — DarrylG, Apr 19 '20 at 00:28
@DarrylG In one of many files, I have the string 'May 10th. Thank god for the rain\Nwhich has helped wash away.' Now, I want to match everything but the '\N'. It is read as '\\N' and I don't want to match it. There are other patterns I don't want to match, but I'm sure if I know how to do it with this one, the most common one I get, I'll know how to do it with any other. — TheSprinter, Apr 19 '20 at 00:37
@Nick Well, how dumb of me, that'll surely do the trick. Maybe I was too focused on how to not match a pattern. Actually, I'd still like to learn how to do it. — TheSprinter, Apr 19 '20 at 00:45
@TheSprinter--if you're reading in a file line by line (i.e. `for line in fhander`, where fhandler is the result of open, then `line = line.rstrip()` is normally used to remove the '\n' at the end of each line. — DarrylG, Apr 19 '20 at 01:13
@DarrylG The newline character in my case is not that newline character that's added at the end of each line read from a text file. This newline character comes in the middle of the lines for the format of the text file that's generated for substation alpha subtitles. — TheSprinter, Apr 19 '20 at 01:28
In that case, you may want to use Nick's suggestion or simply [string replace](https://www.geeksforgeeks.org/python-string-replace/). — DarrylG, Apr 19 '20 at 01:40
`\N` is **not** a linefeed, linefeed is `\n`. In PCRE `\N` means anything that is **not** a linefeed, in Python it simply means `N` — Toto, Apr 19 '20 at 10:00
@Toto Thank you very much. Yes, I see I didn't choose the words very well. But please note, this is not intended for Python to see it as a newline character—It's always read it as just '\\N'—this means that, in the Substation Alpha subtitle format, a line break was found. — TheSprinter, Apr 19 '20 at 16:22

Booboo · Answer 1 · 2020-04-19T12:12:19.050

I will assume that you want to be doing this matching on a line by line basis. The best way to describe how you might go about how to do this is with an example. Let's say I have the following file, test.txt:

{'name': 'Bryan', 'age': 34, 'male': True, 'hometown': 'Boston'}
{'name': 'Anna', 'age': 25, 'male': False, 'hometown': 'Chicago'}
{'name': 'Jeff', 'age': 47, 'male': True, 'hometown': 'Vancouver'}
{'name': 'Maria', 'age': 58, 'male': False, 'hometown': 'Madrid'}

For each line I want to match whatever does not match the regular expression:

r" 'age': \d+,"

So for the first line, that would be:

{'name': 'Bryan', 'male': True, 'hometown': 'Boston'}

In essence we are just replacing the regular expression r" 'age': \d+," with an empty string, so:

import re

pattern = re.compile(r" 'age': \d+,")

with open('test.txt') as f:
    for line in f:
        line = pattern.sub(r'', line)
        print(line, end='')

Prints:

{'name': 'Bryan', 'male': True, 'hometown': 'Boston'}
{'name': 'Anna', 'male': False, 'hometown': 'Chicago'}
{'name': 'Jeff', 'male': True, 'hometown': 'Vancouver'}
{'name': 'Maria', 'male': False, 'hometown': 'Madrid'}

Summary

Search for your regex and replace it by an empty string. What's left is equivalent to having matched everything that was the complement of the regex.

I only just now saw that this method was suggested by @nick in a comment and was now debating whether I should just delete this answer. But I have decided to leave it since "it can't hurt." — Booboo, Apr 19 '20 at 12:11

Regex: How to match the complement of a pattern

1 Answers1