2

I have a file which contains a tab delimited header and line like so:

ID  Field1
test1   "A","B"

Here's my parsing script.

with open(dataFile) as tsv:
    for line in csv.reader(tsv, delimiter='\t'):
        print(line)

And the output:

['ID', 'Field1']
['test1', 'A,"B"']

I can't figure out why it's stripping the double quotes on the first quoted item of the second field. I've tried different dialects and settings for csv reader with no success.

martineau
  • 119,623
  • 25
  • 170
  • 301
YTsa
  • 55
  • 10

3 Answers3

3

The default quote char for csv reader is double quote so it automatically removes them. Changing it to something like '|' will solve your problem. You can do it like this:

with open(dataFile) as tsv:
    for line in csv.reader(tsv, delimiter='\t', quotechar='|'):
        print(line)

From https://docs.python.org/3/library/csv.html#csv.Dialect.quotechar:

Dialect.quotechar

A one-character string used to quote fields containing special characters, such as the delimiter or quotechar, or which contain new-line characters. It defaults to '"'.

EDIT:

Also you can use quoting=csv.QUOTE_NONEoption to disable quoting.

Aysu Sayın
  • 191
  • 7
2

You just need to tell the csv.reader to ignore quoting, via the csv.QUOTE_NONE option:

with open(dataFile) as tsv:
    for line in csv.reader(tsv, delimiter='\t', quoting=csv.QUOTE_NONE):
        print(line)

Output:

['ID', 'Field1']
['test1', '"A","B"']
martineau
  • 119,623
  • 25
  • 170
  • 301
  • Thanks! Went with this as the solution, as it ignores the quoting rather than change it – YTsa May 15 '20 at 16:49
  • @YTsaL FWIW, you can turn the data in the second column into a tuple of strings like `('A', 'B')` by using `line[1] = ast.literal_eval(line[1])`. – martineau May 15 '20 at 17:42
0

It seems you are delimiting a tab and not actually splitting on the comma, I would change your code to reflect this.

Nick Juelich
  • 428
  • 2
  • 13
  • Just to add to this: It seems like you are only bringing in one field. Your program doesn't see a tab so it assumes there is only one field on that first line. (please mark Nick's answer as correct if it solves your problem) – Ben May 15 '20 at 16:11
  • 1
    Where this answer helps the post? He's not asking about the comma – João Castilho May 15 '20 at 16:13
  • Nick, to Joao's point, perhaps you could show the line of code that should change, along with the suggested changes. – Ben May 15 '20 at 16:15