I'm trying to find a regex that splits the string a below into a list. I haven't yet found a foolproof way of splitting the string but the main reason for asking is that I cannot understand why the last string is being duplicated. It does not happen when I'm testing online at regex101.com. To my understanding there should be no reason to duplicate data due to the re.split function.
The code is:
import re
a = ['"This is a string", "and this is another with a , in it", Thisisalsovalid, "",,,"And a string"']
b = re.split(r',(?=(".*?"|[\w/-]*|,))', a[0])
for i in b:
print(i)
and the output:
"This is a string"
"and this is another with a
in it"
Thisisalsovalid
""
"And a string"
"And a string"
The expected output is:
"This is a string"
"and this is another with a , in it"
Thisisalsovalid
""
"And a string"
The list is to be zipped with a list with headers without indexing problems.
As a bonus I would gladly get a regex that splits on ',' except when it occurs in a string.