0

My csv file looks like this:

"City","Name","Comment"
"A","Jay","Like it"
"B","Rosy","Well, good"
...

"K","Anna","Works "fine""

The expected output(dataframe):

City,Name,Comment

A,Jay,'Like it'

B,Rosy,'Well, good'

...

K,Anna,'Works "fine"'

I am trying to read it by doing this :

df=pd.read_csv("test.csv", sep=',', engine='python',encoding='utf8', quoting=csv.QUOTE_ALL)

And it is giving error like this :

ParserError: unexpected end of data

I need to have all the data. So I can not skip any lines with error_bad_lines=True.

How I can fix this issue?

UPDATE: It turns out my original CSV file is missing a quote at the end of the file. I solved the problem by identifying the errors in the file and modifying them.

Claire_L
  • 13
  • 1
  • 6
  • Try to do ```quoting=csv.QUOTE_NONE``` –  Aug 06 '21 at 16:17
  • No luck. I will get an error like this ParserError: Expected 578 fields in line 96, saw 580. (Just a heads up. My csv file has hundreds of columns) – Claire_L Aug 06 '21 at 16:19
  • @Claire_L Is double quote after _"fine""_ intentionally added? _"Works "_ ends after white space? – imxitiz Aug 06 '21 at 16:36
  • @Xitiz Yes. This string has double quotes in it. – Claire_L Aug 06 '21 at 16:42
  • @Claire_L Expected output for that provided input? – imxitiz Aug 06 '21 at 16:52
  • Noooooooo, not here in question! And not for only last for complete dataframe @Claire_L – imxitiz Aug 06 '21 at 17:00
  • @Xitiz Take the last row as an example. It should look like this in a dataframe:| K | Anna | Works "fine" ('|' is not an actual character. It is just used to distinguish different cells in the dataframe) – Claire_L Aug 06 '21 at 17:03
  • @Claire_L I may provide complete solution just after a bit but test `doublequote=1` and `quoting=1` is this what you want? We can remove that `""` after "fine" and add `'` but I don't think we can do `works "fine"` at least not by me! – imxitiz Aug 06 '21 at 17:17
  • @Claire_L welcome to SO, It seems the error occured because the last line in your .csv file. " work "fine"" should be "work fine" or "work \"fine\"" , if you want to use double-quoted string. – Raha Moosavi Aug 06 '21 at 18:03

1 Answers1

1

I believe the trick is to preprocess and then read the data


import re
from io import StringIO
import pandas as pd

data = """
"City","Name","Comment"
"A","Jay","Like it"
"B","Rosy","Well, good"
"K","Anna","Works "fine""
"""

data = re.sub('(?<!^)"(?!,")(?<!,")(?!$)', '\\"', data, flags=re.M)

x = pd.read_csv(StringIO(data), escapechar='\\')

print(x)

Outputs

   City  Name       Comment
0     A   Jay       Like it
1     B  Rosy    Well, good
2     K  Anna  Works "fine"

And in theory this should work the same with the file

with open('test.csv', 'r') as f:
    data = re.sub('(?<!^)"(?!,")(?<!,")(?!$)', '\\"', f.read(), flags=re.M)
    df = pd.read_csv(StringIO(data), escapechar='\\')
    print(df)

Edit : It outputs as following

  City  Name       Comment
0    A   Jay       Like it
1    B  Rosy    Well, good
2    K  Anna  Works "fine"

From kayoz answer

Edit 2: and the last column is easy to change with a lambda or similar function.

df['Comment'] = df['Comment'].apply(lambda x: "'" + str(x) + "'")

Link to the original

TL;DR

import re
from io import StringIO
import pandas as pd

with open('test.csv', 'r') as f:
    data = re.sub('(?<!^)"(?!,")(?<!,")(?!$)', '\\"', f.read(), flags=re.M)
    df = pd.read_csv(StringIO(data), escapechar='\\')
    df['Comment'] = df['Comment'].apply(lambda x: "'" + str(x) + "'")
    print(df)
  City  Name  Comment
0    A   Jay  'Like it'
1    B  Rosy  'Well, good'
2    K  Anna  'Works "fine"'
Ossi H.
  • 84
  • 6