3

I am working with very large .csv files and am attempting to find the number of lines in the file as well as other things such as parsing into json, etc.

my question is how do I overcome the limitations of the csv library because I am constantly receiving the following error.

I am providing a sample program that I know to work in python3 that will return the number of rows in the csv file.

 import csv

    input = 'large-input.csv'
    with open(input ,"r") as f:
        reader = csv.reader(f,delimiter = ",")
        data = list(reader)
        row_count = len(data)
        print(row_count)

however, I continue getting this error when run against a 1.5GB csv file.

Traceback (most recent call last):
  File "csv-len.py", line 6, in <module>
    data = list(reader)
_csv.Error: field larger than field limit (131072)

Any work-arround this issue is greatly appreciated. thanks!

Community
  • 1
  • 1
Seth Wahle
  • 166
  • 1
  • 2
  • 12
  • @OluwafemiSule This question is different. Reading the CSV into a generator won't help. You would still have to exhaust the generator to get the length. – Remolten May 23 '17 at 21:08
  • I don't know how to answer my own question, but I the solution I found was to open the file using pandas and get the shape of the resultant dataframe. – Seth Wahle Dec 19 '18 at 20:16

1 Answers1

4

CSVs are generally newline delimited so running it through a CSV parser just to count the number of lines is likely inefficient compared to just counting the number of lines.

Something like this would be much quicker. You could subtract a line for the header if necessary.

def row_count(input):
    with open(input) as f:
        for i, l in enumerate(f):
            pass
    return i