You can use re.split
to split the string with regular expressions:
import re
s = '\"Distance 1: Distance XY\" 1 2 4 5 9 \"Distance 2: Distance XY\" 3 6 8 10 5 \"Distance 3: Distance XY\" 88 45 36 12 4'
re.split(r'(?<=\d)\s+(?=\")', s)
# ['"Distance 1: Distance XY" 1 2 4 5 9',
# '"Distance 2: Distance XY" 3 6 8 10 5',
# '"Distance 3: Distance XY" 88 45 36 12 4']
(?<=\d)\s+(?=\")
constrains the delimiter to be the space between a digit and a quote.
If it is smart quote in the text file, replace \"
with smart quote, option + [ on mac, check here for windows:
with open("test.txt", 'r') as f:
for line in f:
print(re.split(r'(?<=\d)\s+(?=“)', line.rstrip("\n")))
# ['“Distance 1: Distance XY” 1 2 4 5 9', '“Distance 2: Distance XY” 3 6 8 10 5', '“Distance 3: Distance XY” 88 45 36 12 4']
Or use the unicode for left smart quotation marks \u201C
:
with open("test.csv", 'r') as f:
for line in f:
print(re.split(r'(?<=\d)\s+(?=\u201C)', line.rstrip("\n")))
# ['“Distance 1: Distance XY” 1 2 4 5 9', '“Distance 2: Distance XY” 3 6 8 10 5', '“Distance 3: Distance XY” 88 45 36 12 4']