Regex invert selection

Question

I looked around quit a bit but was unable to find an answer to this question.

I'm trying to select everything from a string except white spaces that repeat over a certain number of times. I've found a regex to select the white spaces, and what I was hoping for was an easy way to get the exact inverse of this, but I haven't found a way to do that yet. I'm ultimately going to implement this in python if that matters.

Below is my test string, current regex, and link to the regex test site I was using.

Current regex

test string:

'All: Day and Night                                                                                                                                                                                                                                             Vulnerabilities\\Personnel vulnerabilities\\Outdoor vulnerability                                                                                                                                                                                                1E-09                                                                                                                                                                                                                                                          /AvgeYear                                                                                                                                                                                                                                                      \x1a'

Regex:

[ ]{50,}

Like this? `\S+(?:[ ]{1,49}\S+)*` https://regex101.com/r/ZKq68U/1 — The fourth bird, May 06 '21 at 15:21

PacketLoss · Answer 1 · 2021-05-06T15:33:43.347

1

You could use the same regex and just re.sub your match to a single space.

re.sub(r'[ ]{50,}', ' ', string)
#'All: Day and Night Vulnerabilities\\Personnel vulnerabilities\\Outdoor vulnerability 1E-09 /AvgeYear \x1a'

If you want it as a list, simply use the same regex and use re.split() instead of re.sub()

re.split(r'[ ]{50,}', string)
#['All: Day and Night', 'Vulnerabilities\\Personnel vulnerabilities\\Outdoor vulnerability', '1E-09', '/AvgeYear', '\x1a']

edited May 06 '21 at 15:33

answered May 06 '21 at 15:22

PacketLoss

5,561
1
9
27

this also gets me there, though I get a single string returned, which isn't ideal, but I should have specified – SuperStew May 06 '21 at 15:31
@SuperStew Do you want a `list`? – PacketLoss May 06 '21 at 15:32
yea that's better for the rest of my code – SuperStew May 06 '21 at 15:33
@SuperStew Updated to output list. – PacketLoss May 06 '21 at 15:33

The fourth bird · Accepted Answer · 2021-05-06T15:36:33.130

You could match 1+ non whitespace chars, and optionally repeat 1-49 spaces and 1+ non whitespace chars.

\S+(?:[ ]{1,49}\S+)*

See a regex demo | Python demo

Example

from pprint import pprint
import re

regex = r"\S+(?:[ ]{1,49}\S+)*"
s = "All: Day and Night                                                                                                                                                                                                                                             Vulnerabilities\\\\Personnel vulnerabilities\\\\Outdoor vulnerability                                                                                                                                                                                                1E-09                                                                                                                                                                                                                                                          /AvgeYear                                                                                                                                                                                                                                                      \\x1a'"

pprint(re.findall(regex, s))

Output

['All: Day and Night',
 'Vulnerabilities\\\\Personnel vulnerabilities\\\\Outdoor vulnerability',
 '1E-09',
 '/AvgeYear',
 "\\x1a'"]

Regex invert selection

2 Answers2