1

I am wondering if there is a Python or Pandas function that approximates the Ruby #each_slice method. In this example, the Ruby #each_slice method will take the array or hash and break it into groups of 100.

var.each_slice(100) do |batch|
  # do some work on each batch

I am trying to do this same operation on a Pandas dataframe. Is there a Pythonic way to accomplish the same thing?

I have checked out this answer: Python equivalent of Ruby's each_slice(count)

However, it is old and is not Pandas specific. I am checking it out but am wondering if there is a more direct method.

Community
  • 1
  • 1
analyticsPierce
  • 2,979
  • 9
  • 57
  • 81

1 Answers1

1

There isn't a built in method as such but you can use numpy's array_slice, you can pass the dataframe to this and the number of slices.

In order to get ~100 size slices you'll have to calculate this which is simply the number of rows/100:

import numpy as np
# df.shape returns the dimensions in a tuple, the first dimension is the number of rows
np.array_slice(df, df.shape[0]/100)

This returns a list of dataframes sliced as evenly as possible

EdChum
  • 376,765
  • 198
  • 813
  • 562