0

I have a large single column CSV file that I need to read each row and convert to a float and then find the min and max and mean average for each chunk of data. The data has 16 decimal precision.

I have tried processing with pandas as chunk, but am ne to pandas and don't seem to understand how each defined chunk (1000 rows by 1 column) is treated.

How can I convert each row in the chunk to a float [list] so that I can then find the min, max, and mean average?

    chunk_size = 1000 ** 1
    for chunk in pd.read_csv(filename, chunksize=chunk_size):
        mpg = []
        for row in chunk:
            mpg = [float(row[0]) for row in chunk]
            print mpg

        tmpMax = max(mpg)
        print tmpMax
Patrick Artner
  • 50,409
  • 9
  • 43
  • 69
  • please share 2-3 lines of your data so we get a impression how it looks like: [edit] your question and add it as "code" for formatting purposes – Patrick Artner Jan 17 '19 at 20:03
  • Have you seen this blog? [How to Load a Massive File as small chunks in Pandas?](https://cmdlinetips.com/2018/01/how-to-load-a-massive-file-as-small-chunks-in-pandas/) I have used something like this to load large datasets that limit my upload rates. – Jesse Jan 17 '19 at 20:42
  • [Stackoverflow: how to read a 6 GB csv](https://stackoverflow.com/questions/25962114/how-to-read-a-6-gb-csv-file-with-pandas) – Jesse Jan 18 '19 at 21:27

0 Answers0