I have a dataframe that looks like this:
a b c
0 x x x
1 y y y
2 z z z
I would like to apply a function to each row of dataframe. That function then creates a new dataframe with multiple rows from each input row and returns it. Here is my_func:
def my_func(df):
dup_num = int(df.c - df.a)
if isinstance(df, pd.Series):
df_expanded = pd.concat([pd.DataFrame(df).transpose()]*dup_num,
ignore_index=True)
else:
df_expanded = pd.concat([pd.DataFrame(df)]*dup_num,
ignore_index=True)
return df_expanded
The final dataframe will look like something like this:
a b c
0 x x x
1 x x x
2 y y y
3 y y y
4 y y y
5 z z z
6 z z z
So I did:
df_expanded = df.apply(my_func, axis=1)
I inserted breakpoints inside the function and for each row, the created dataframe from my_func is correct. However, at the end, when the last row returns, I get an error stating that:
ValueError: cannot copy sequence with size XX to array axis with dimension YY
As if apply is trying to return a Series not a group of dataFrames that the function created.
So instead of df.apply I did:
df_expanded = df.groupby(df.index).apply(my_func)
Which just creates groups of single rows and applies the same function. This on the other hand works.
Why?