I have got a pd.DataFrame, with each line representing a group of people. They have got an id (several columns in my dataframe, but here summarized by the "id"
column of my example dataframe). Each of this group represent several people (column "size"
).
I am trying to divide theses groups into smaller groups of a maximal size "max_size"
. For example, if max_size = 5
a row with "id" = "foo"
and "size" = 13
should be replaced by three rows, all with "id" = "foo"
and respective "size" = 5
, "size" = 5
, and "size" = 3
I have coded an usable function, but am looking for a more pandas idiomatic way to do it, if it exists.
my function is
def custom_duplicating_function(df):
def aux_custom_duplicating_function(row, max_size=5):
row = row.to_dict()
size = row["size"]
L = [row.copy() for i in range((size // max_size + 1))]
for i in range(len(L) - 1):
L[i]["size"] = max_size
L[-1]["size"] = size%max_size
return(pd.DataFrame.from_dict(L))
temp = df.apply(aux_custom_duplicating_function, axis=1)
result = pd.concat([temp[i] for i in range(len(temp.index))])
return(result)
The following dataframe
test = pd.DataFrame.from_dict([{"id":"foo", "size":13},
{"id":"bar", "size":17},
{"id":"baz", "size":3}])
************
id size
0 foo 13
1 bar 17
2 baz 3
************
should be transformed in
id size
0 foo 5
1 foo 5
2 foo 3
0 bar 5
1 bar 5
2 bar 5
3 bar 2
0 baz 3