I need to daily ingest a CSV file into a pandas dataframe
. The CSV has several thousand rows but every day I get a few records with more columns than expected. Let me give you an example. Take the following CSV
:
Name, Address, Phone
John Doe, 777 Street, 3145678777
Jane Doe, 888 Street, 3145678888
Chris Doe, Adam Smith, 999 Street, 3145679999
Ellen Page, 222 Street, 3145679222
This is my import line:
df = pd.read_csv(myfile.csv, header = 0, names = ['Name, 'Address', 'Phone'])
As expected, row number 3 (Chris Doe, Adam Smith, 999 Street, 3145679999
) breaks the process for it has an extra column. It seems that the application my source data is coming from allows users to enter commas in the Name
field and, sometimes, they do so when multiple users share the same household. I cannot change the application.
What I am aiming at is simply detecting those rows and moving them to a separate text file or dataframe, anything that makes sense. I can deal with those records separately, that is fine.
I have seem some posts aiming at processing the CSV with different number of records. I reckon than can complicate my process and, in my case, it is not worthy.
I would appreciate any help on how to achieve this in the most straightforward and simple possible way.
Thanks!