How to count the number of fields in each row of a broken csv

Question

I have a number of supposed csv's but in fact they have some rows with different numbers of fields. I would like to found out which rows these are and look at them. If the csv's weren't broken I would just use pandas and do:

df = pd.read_csv("file.csv")

But this isn't suitable for data cleaning and preprocessing I need to do.

How can I find the number of fields in each row in a "csv" file? Is it, for example, possible to just read in one row at a time, without remembering the number of fields from previous rows?

You can visually get a list of all "bad" lines by calling `pd.read_csv('file.csv',error_bad_lines=False)`. I am not sure you can store it in a variable for further processing. — DYZ, Jun 08 '18 at 21:07
[Possible duplicate](https://stackoverflow.com/questions/32334966/pandas-bad-lines-warning-capture). — DYZ, Jun 08 '18 at 21:09

score 1 · Accepted Answer · edited Oct 07 '21 at 11:19

1

CSV is not a fully defined standard, so close to RFC 4180 you can do something like this

import re
with open('file.csv', 'r') as f:
    print([re.sub(r'("[^"]*),([^"]*")', r'\1<comma>\2', l).count(',') for l in f.readlines()])

which counts the commas after replacing the ones enclosed in double quotes.

edited Oct 07 '21 at 11:19

Community

1
1

answered Jun 08 '18 at 21:34

Diego Torres Milano

65,697
9
111
134

score -1 · Answer 2 · answered Jun 10 '18 at 07:50

-1

It seems the following works.

import csv
def f(s):
    return map(len,csv.reader(s.split("\n"))

answered Jun 10 '18 at 07:50

Simd

19,447
42
136
271

How to count the number of fields in each row of a broken csv

2 Answers2