2

Here is the content of a csv file 'test.csv', i am trying to read it via pandas read_csv()

"col1", "col2", "col3", "col4"
"v1", "v2", "v3", "v4"
"v21", "v22", "v23", "this, "creating, what to do? " problems"

This is the command i am using -

messages = pd.read_csv('test.csv', sep=',', skipinitialspace=True)

But i am getting the following error -

CParserError: Error tokenizing data. C error: Expected 4 fields in line 3, saw 5

i want the content for column4 in line3 to be 'this, "creating, what to do? " problems'

How to read file when a column can have quotechar and delimiter included in it ?

Amol Sharma
  • 1,521
  • 7
  • 20
  • 40
  • 2
    The problem is that your csv appears to be malformed. Pandas will allow you to use `"` as a `quotechar`, but you have unescaped quote characters in your column. If your third row was instead `"v21", "v22", "v23", "this, \"creating, what to do? \" problems"`, you could use `\ ` as the `escapechar`, and this would work. – SPKoder Feb 28 '16 at 19:14
  • csv is not not something i am generating so can't control that. – Amol Sharma Feb 28 '16 at 19:20
  • one option thats working for me is using `'",'` as the delimiter but that will require an additional step of cleanup of columns to remove the other `"` – Amol Sharma Feb 28 '16 at 19:21

1 Answers1

3

pandas does not allow you to keep malformed rows and to be honest I don't really see a way of ignoring some " characters but not others in your example. I think your intuition of using '", "' as the delimiter and then doing a cleanup is the best approach. If you're really worried about doing this in one line:

message = pd.read_csv('test.txt', sep='", "', names = ['col1','col2','col3','col4'], skiprows=1).apply(lambda x: x.str.strip('"'))

which handles stripping quotes in the column names as well and gives you:

>>> message
>>> 
  col1 col2 col3                                     col4
0   v1   v2   v3                                       v4
1  v21  v22  v23  this, "creating, what to do? " problems
bunji
  • 5,063
  • 1
  • 17
  • 36
  • @ragesz Can you clarify your question please. Do you mean "what if the column labels are not quoted?" or "what if certain columns contain values that are not quoted?" – bunji May 12 '16 at 16:11
  • I had [this](http://stackoverflow.com/questions/37074914/python-pandas-read-csv-quotechar-does-not-work) problem and I was searching for a solution but I didn't find any, so finally I asked, and got an answer. – ragesz May 12 '16 at 16:42