9

In the following script, is there a way to find out how many "chunks" there are in total?

import pandas as pd
import numpy as np

data = pd.read_csv('data.txt', delimiter = ',', chunksize = 50000) 

for chunk in data:
    print(chunk)

Using len(chunk) will only give me how many each one has.

Is there a way to do it without adding the iteration manually?

Michael Currie
  • 13,721
  • 9
  • 42
  • 58
Leb
  • 15,483
  • 10
  • 56
  • 75

1 Answers1

11

CSV, being row-based, does not allow a process to know how many lines there are in it until after it has all been scanned.

Very minimal scanning is necessary, though, assuming the CSV file is well formed:

sum(1 for row in open('data.txt', 'r'))

This might prove useful in case you need to calculate in advance how many chunks there are. A full CSV reader is an overkill for this. The above line has very low memory requirements, and does minimal parsing.

Ami Tavory
  • 74,578
  • 11
  • 141
  • 185