I'm struggling to split text rows, based on variable delimiter, and preserve empty fields and quoted data.
Examples:
1,"2",three,'four, 4',,"6\tsix"
or as tab-delimited vesion
1\t"2"\tthree\t'four, 4'\t\t"6\tsix"
Should both result in:
['1', '"2"', 'three', 'four, 4', '', "6\tsix"]
So far, i've tried:
Using split, but clearly the quoted delimiters are not handled as desired.
solutions using the csv library, but it tends to have options that quotes everything or nothing, without preserving the original quotes.
Regex, particularly following the pattern from the following answer, but it drops the empty fields: How to split but ignore separators in quoted strings, in python?
Using the pyparsing library. The best i've managed is as follows, but this also drops the empty fields (using the comma delimiter example):
s = '1,"2",three,\'four, 4\',,"6\tsix"' wordchars = (printables + ' \t\r\n').replace(',', '', 1) delimitedList(OneOrMore(quotedString | Word(wordchars)), ',').parseWithTabs().parseString(s)
Thanks for any ideas!