0

I looked around quit a bit but was unable to find an answer to this question.

I'm trying to select everything from a string except white spaces that repeat over a certain number of times. I've found a regex to select the white spaces, and what I was hoping for was an easy way to get the exact inverse of this, but I haven't found a way to do that yet. I'm ultimately going to implement this in python if that matters.

Below is my test string, current regex, and link to the regex test site I was using.

Current regex

test string:

'All: Day and Night                                                                                                                                                                                                                                             Vulnerabilities\\Personnel vulnerabilities\\Outdoor vulnerability                                                                                                                                                                                                1E-09                                                                                                                                                                                                                                                          /AvgeYear                                                                                                                                                                                                                                                      \x1a'

Regex:

[ ]{50,}
SuperStew
  • 2,857
  • 2
  • 15
  • 27

2 Answers2

1

You could use the same regex and just re.sub your match to a single space.

re.sub(r'[ ]{50,}', ' ', string)
#'All: Day and Night Vulnerabilities\\Personnel vulnerabilities\\Outdoor vulnerability 1E-09 /AvgeYear \x1a'

If you want it as a list, simply use the same regex and use re.split() instead of re.sub()

re.split(r'[ ]{50,}', string)
#['All: Day and Night', 'Vulnerabilities\\Personnel vulnerabilities\\Outdoor vulnerability', '1E-09', '/AvgeYear', '\x1a']
PacketLoss
  • 5,561
  • 1
  • 9
  • 27
1

You could match 1+ non whitespace chars, and optionally repeat 1-49 spaces and 1+ non whitespace chars.

\S+(?:[ ]{1,49}\S+)*

See a regex demo | Python demo

Example

from pprint import pprint
import re

regex = r"\S+(?:[ ]{1,49}\S+)*"
s = "All: Day and Night                                                                                                                                                                                                                                             Vulnerabilities\\\\Personnel vulnerabilities\\\\Outdoor vulnerability                                                                                                                                                                                                1E-09                                                                                                                                                                                                                                                          /AvgeYear                                                                                                                                                                                                                                                      \\x1a'"

pprint(re.findall(regex, s))

Output

['All: Day and Night',
 'Vulnerabilities\\\\Personnel vulnerabilities\\\\Outdoor vulnerability',
 '1E-09',
 '/AvgeYear',
 "\\x1a'"]
The fourth bird
  • 154,723
  • 16
  • 55
  • 70