0

I have this dataframe:

  class number_of_lessons
0   C   15
1   A   5
2   B   8
3   E   11
4   N   12
5   F   5
6   D   10

I want to randomly select some rows from this table such as the sum of number of lessons in the new dataframe equals to 20.

As an example of output, we can have:

   class number_of_lessons
0   C   15
1   A   5

or

class number_of_lessons
1   A   5
5   F   5
6   D   10

I tried too many things but nothing was correct. Any idea how to do it?

Samorix
  • 307
  • 4
  • 17
  • Does this answer your question? [randomly sample rows of a dataframe until the desired sum of a column is reached](https://stackoverflow.com/questions/43509114/randomly-sample-rows-of-a-dataframe-until-the-desired-sum-of-a-column-is-reached) – sushanth Jun 08 '20 at 17:12
  • Did you try [subset sum](https://stackoverflow.com/questions/23087820/python-subset-sum) and then index by those elements that made the sum? – Balaji Ambresh Jun 08 '20 at 17:25
  • Totally @Sushanth thank you; I just added an "while ((iteration<50) & (acres<20)):" before the suggested solution so that it stops when I have exactly a sum of 20 – Samorix Jun 08 '20 at 17:36

2 Answers2

0

Try with the next code:

df = pd.DataFrame({'class': ['C', 'A', 'B', 'E', 'N', 'F', 'D'],  'number_of_lessons':[15, 5, 8, 11, 12, 5, 10]})

classes = []
lessons = 0
for i in df.sample(frac=1).iterrows():
    if (lessons + i[1]['number_of_lessons']) <= 20:
        lessons += i[1]['number_of_lessons']
        classes.append(i[1]['class'])
    if lessons == 20:
        break

print(df[df['class'].isin(classes)])

Output:

  class  number_of_lessons
2     B                  8
4     N                 12
IMB
  • 519
  • 4
  • 19
0

Here's a simple approach that you could loop until you get exactly 20.

df_new = df
while df_new.number_of_lessons.sum() != 20:
    df_shuffled = df.sample(frac = 1) #shuffle data
    new_df = df_shuffled[df_shuffled.numer_of_lessons.cumsum() <= 20]

The second line makes a column of cumsum, then filters it using '<= 20'.