1

I have a DataFrame which I want to slice into many DataFrames by adding rows by one until the sum of column Score of the DataFrame is greater than 50,000. Once that condition is met, then I want a new slice to begin.

Here is an example of what this might look like:

semblable
  • 773
  • 1
  • 8
  • 26
Heo M
  • 13
  • 3

1 Answers1

0

Sum Score cumulatively, floor divide it by 50,000, and shift it up one cell (since you want each group to be > 50,000 and not < 50,000).

import pandas as pd
import numpy as np

# Generating DataFrame with random data
df = pd.DataFrame(np.random.randint(1,60000,15))

# Creating new column that's a cumulative sum with each
# value floor divided by 50000
df['groups'] = df[0].cumsum() // 50000

# Values shifted up one and missing values filled with the maximum value
# so that values at the bottom are included in the last DataFrame slice
df.groups = df.groups.shift(-1, fill_value=df.groups.max())

Then as per this answer you can use pandas.DataFrame.groupby in a list comprehension to return a list of split DataFrames.

df_list = [df_slice for _, df_slice in df.groupby(['groups'])]
semblable
  • 773
  • 1
  • 8
  • 26