I have a data stored as a csv file with 250 plus columns and 700K records. I am encountering the parse error during read option. My objective is not resolve it but to debug/identify those error causing records
I already referred the posts here, here, here. So, it's not a duplicate
When I try the code below, I get the parse error as given below
df1 = pd.read_csv('New__Document.csv',low_memory=False)
ParserError: Error tokenizing data. C error: Expected 258 fields in line 14, saw 263
Based on this post, I followed the below suggestion and it works fine
df = pd.read_csv('New__Document.csv',low_memory=False,on_bad_lines='skip')
len(df) # returns 365902 records
However, this results in loss of records. Therefore, I tried another suggestion
df1 = pd.read_csv('New__Document.csv',low_memory=False, sep='\t')
len(df1) # returns 762521 records.
But this doesn't display the output in tabular format (due to tab delimiter).
I would like to view the offending/bad records (obtained by subtracting = 396,619 records) in a neat tabular format. Because more than half of the dataset is lost due to this issue.
So, am seeking your help to know what is causing this issue? If I can identify those error causing records and store it in a table format, it would be helpful for me to review