pandas apply function that returns more rows

Question

I have a dataframe with multiple columns some of which are lists. I would like to apply a function on each row that essentially expands each row into n rows (n changes for each row) after some data manipulations on the lists.

A simplified version of this can be seen here:

df = pd.DataFrame({'id':[0,1],'value':[[0,1,2],[3,4]]}).set_index('id')

def func(x):
    v = np.array(x['value'])
    return pd.Series([v,v**2],index=['value','value_2'])

My desired output is:

    id  value   value_2
0   0   0   0
1   0   1   1
2   0   2   4
3   1   3   9
4   1   4   16

If I apply the function I get an output with the same number of rows as the original dataframe which I then need to reshape:

df.apply(func,axis=1)

    value   value_2
id      
0   [0, 1, 2]   [0, 1, 4]
1   [3, 4]  [9, 16]

Is there a way to get the desired outcome without needing to reshape after applying the function?

Do `value` and `value_2` always have the same dimension? if the 3rd row has 7 elements in `value`, does it mean that the 3rd row of `value_2` has 7 elements as well? — Albert Alonso, Apr 16 '19 at 18:20

score 2 · Accepted Answer · answered Apr 16 '19 at 18:20

2

You can unnest then use vectorized operations:

u = unnest(df.reset_index(), ['id'], ['value'])
u.assign(value_2=u.value**2)

   id  value  value_2
0   0      0       0
1   0      1       1
2   0      2       4
3   1      3       9
4   1      4      16

answered Apr 16 '19 at 18:20

user3483203

50,081
9
65
94

Umar.H · Answer 2 · 2019-04-16T19:05:04.003

0

Another possible answer is by using pd.Series + stack

df = df.value.apply(pd.Series).stack().reset_index(level=0).rename(columns={0:'value'}).set_index('id')
df.apply(func,axis=1)
print(df)
        value   value_2
    id       
    0   0.0     0.0
    0   1.0     1.0
    0   2.0     4.0
    1   3.0     9.0
    1   4.0     16.0

edited Apr 16 '19 at 19:05

answered Apr 16 '19 at 18:57

Umar.H

22,559
7
39
74

pandas apply function that returns more rows

2 Answers2