0

Is it possible to read_csv a file that has a double quote as a value and fields are wrapped in double quotes to ignore commas as values? An example file looks like this:

"fie,ld1","fi"e,ld2","field3"
"test","testing","meow"

Desired output is this:

fie,ld1 fi"e,ld2 field3
test    testing  meow

I've tried all sorts of read_csv options, chat gpt, web searching.

John Gordon
  • 29,573
  • 7
  • 33
  • 58
matixsnow
  • 69
  • 4
  • 3
    The first row is not a valid. You'll have to fix the file before you can read it. See https://stackoverflow.com/q/17808511/494134 – John Gordon Aug 17 '23 at 02:28
  • Please always include your code for what you've tried, so that others don't suggest something you've already rules out, and so that people can see if your code has errors. See [ask] – Robert Aug 17 '23 at 02:41
  • I don't think it matters if you see my garbage read_csv attempts. John didn't have any trouble answering the question. – matixsnow Aug 17 '23 at 02:43
  • If values can have both double quotes and commas without escaping, then values like `fie","ld2` are valid, but it is logically impossible to parse them correctly. Is it guaranteed that there is no such value? If so, how? – ken Aug 17 '23 at 03:59

1 Answers1

0

I think you are saying that "," is the delimiter in this document and you want to clean the data.

The following code prints:

fie,ld1 fi"e,ld2 field3
test testing meow

and the the output can then be appended to a "clean" csv file.

import re

f = open("strangeFile.csv", "r")
lines = f.read().splitlines() #split file into a list of lines
f.close()

for line in lines:
    items = line.split("\",\"") #split by 'strange' delimiter
    line = " ".join(items)  # join with a space, comma or tab
    line = re.sub("^\"|\"$","",line) #remove open and close quote marks

    print(line)
    #append line to a new csv file
Ned Hulton
  • 477
  • 3
  • 12
  • 27