The aim is to find the total number of rows in a large CSV file. I m using Python Dask to find it for now, but as the file size is around 45G it takes quite some time. Unix cat
with wc -l
seems to perform better.
So the question is - Are there any tweaks for dask / pandas read_csv to make it find the total numbers of rows faster?