1

I'm really struggling with reading some messy string data from a CSV file into a DataFrame. I'm working with Python 3.8.5 and Pandas 1.1.3

My CSV file contains integers, dates, and strings bookended with double quote " characters. An example of the first few rows is:

Using pd as an instance of Pandas like this:

import pandas as pd
import csv


myfilename = 'Input.csv'
myexpenses = pd.read_csv(myfilename, skipinitialspace=True, quotechar='"', quoting=csv.QUOTE_ALL)

Consistently gives me:

ParserError: Error tokenizing data. C error: Expected 37 fields in line 3, saw 42

This is clearly because there are actually 41 commas spread across the 37 fields. I'm not sure why these aren't being handled by the quotechar statement.

How can I handle these strings as a single field entry?

I've probably tried a dozen different tweaks of this function such as this, this, this, etc.

These include tweaking variables like quoting=csv.QUOTE_ALL without result.

Do I need to involve reg expressions?

Any help would be greatly appreciated - thank you in advance.

EDIT: Added full code. For those asking, these are the columns and first five entry lines of the CSV file:

Date,Employee,Project,Expense,Description,Units,Cost Rate,Cost Amount,Markup %,Charge Amount,Billable,Billed Status,Submit Status,Approved By,Reimbursable,Paid,Paid Date,Income Account,Expense Account,Class,Country,Purchase Tax Rate,Extra,Tax1 %,Tax2 %,Tax3 %,Credit Card,Check Number,Vendor Bill Number,Invoice Number,Client,Attachments,Memo,Created By,Created On,Last Updated By,Last Updated

8/27/2021,"Lastname, Firstname","2021-123 - Bob & Bob (Bob's - New York, NY) Project - Support",Meals:Project Meals,Meals,1,43.64,43.64,0,43.64,True,Un Billed,Un-Submitted,,False,False,8/27/2021,,,Meals-Project,U.S. Dollar,0,False,0,0,0,1234-56 - Lastname - 1234,,,,"Bob & Bob, Inc.",1,,"Lastname, Firstname",8/27/2021,"Lastname, Firstname",8/27/2021

8/27/2021,"Lastname, Firstname","2021-123 - Bob & Bob (Bob's - New York, NY) Project - Support",Auto Expense:Fuel,Fuel,1,29.41,29.41,0,29.41,True,Un Billed,Un-Submitted,,False,False,8/27/2021,,,Fuel,U.S. Dollar,0,False,0,0,0,1234-56 - Lastname - 1234,,,,"Bob & Bob, Inc.",1,,"Lastname, Firstname",8/27/2021,"Lastname, Firstname",8/27/2021

8/27/2021,"Lastname, Firstname","2021-123 - Bob & Bob (Bob's - New York, NY) Project - Support",Airfare:Flight,Plane Ticket,1,658.4,658.4,0,658.4,True,Un Billed,Un-Submitted,,True,False,8/27/2021,,,Flight,U.S. Dollar,0,False,0,0,0,,,,,"Bob & Bob, Inc.",1,,"Lastname, Firstname",8/27/2021,"Lastname, Firstname",8/27/2021

8/26/2021,"Lastname, Firstname","2021-123 - Bob & Bob (Bob's - New York, NY) Project - Support",Meals:Project Meals,Meals,1,32.28,32.28,0,32.28,True,Un Billed,Un-Submitted,,False,False,8/26/2021,,,Meals-Project,U.S. Dollar,0,False,0,0,0,1234-56 - Lastname - 1234,,,,"Bob & Bob, Inc.",1,,"Lastname, Firstname",8/27/2021,"Lastname, Firstname",8/27/2021

8/26/2021,"Lastname, Firstname","2021-123 - Bob & Bob (Bob's - New York, NY) Project - Support",Meals:Project Meals,Meals,1,6.58,6.58,0,6.58,True,Un Billed,Un-Submitted,,False,False,8/26/2021,,,Meals-Project,U.S. Dollar,0,False,0,0,0,1234-56 - Lastname - 1234,,,,"Bob & Bob, Inc.",1,,"Lastname, Firstname",8/27/2021,"Lastname, Firstname",8/27/2021

Greg
  • 11
  • 3
  • does this answer to your question? https://stackoverflow.com/questions/18039057/python-pandas-error-tokenizing-data – dhananjay sawarkar Aug 28 '21 at 18:26
  • can you show us line 3? It could be there are some UTF quote characters that look like " but are actually different characters. Or, there are internal quotes not escaped, or it just isn't formatted properly – Garr Godfrey Aug 28 '21 at 18:28
  • 2
    Give an exact example of the first 5 rows or so to reproduce the exact error. Provide the full code needed to reproduce the error including imports. – Mark Tolonen Aug 28 '21 at 18:33
  • Yes, since error is in row 3, showing us part of a row before that, which must be working, isn't helpful. – Garr Godfrey Aug 28 '21 at 18:42
  • Sure, I've added those edits with a sanitized input file. The only other code imports Pandas and CSV, then sets the CSV filename for import. – Greg Aug 29 '21 at 16:22

1 Answers1

1

Try "csv.QUOTE_NONE"

import pandas as pd
import csv

myexpenses = pd.read_csv(myfilename, csv.QUOTE_NONE)