0

I'm looking for a Pythonic way to iterate through a dataframe's index such that I can chunk a computationally heavy list into a series of smaller lists to run. The output of each chunk will be appended to a CSV in order to avoid resource limits.

For example, if I have some list who's length is prime, I'd like to split that list into a number of lists of relatively equal length, run the computations against that set, and and append the output of that set to a CSV. Rinse and repeat all the way down the index of the dataframe until all of the rows have been run against.

e.g.

  1. Run some function on the first 10,000 - store in csv
  2. Run on the 10,001 - 20,000 row - store in csv
  3. .....
  4. Run through row 111,376 - store in csv
  5. end.
bunbun
  • 2,595
  • 3
  • 34
  • 52
broseidon
  • 85
  • 1
  • 7
  • Have you tried [iterating over slices of the frame](https://stackoverflow.com/a/1335456/364696)? – ShadowRanger Dec 27 '18 at 01:44
  • If you are working on `pandas.DataFrame` and worried about the resource limits, you can try passing `iterator=True` when you read your csv, and do `yourDF.get_chunk(n)` where n is your desired number of rows. – Chris Dec 27 '18 at 02:21
  • Thanks to all! @ShadowRanger I was able to use the Chunker object classification to iterate over slices successfully. – broseidon Dec 27 '18 at 14:56
  • @Chris chunking on the way in helped immensely, I'm now below 50% of my memory usage running this script. Thank you. – broseidon Dec 27 '18 at 14:56

1 Answers1

0

Per @ShadowRanger:

Have you tried iterating over slices of the frame? - Iteration over list slices

broseidon
  • 85
  • 1
  • 7