Python/pandas - Using DataFrame.apply with function returning dictionary

Question

I am aware of how the apply function can be used on a dataframe to calculate new columns and append them to the dataframe. My question is if I have a function which takes as parameters several values (corresponding to the columns currently in the dataframe) and returns a dictionary (corresponding to the columns I want to add to the dataframe), is there a simple/elegant way to apply this function to the dataframe and generate the new columns?

For example, currently I am doing this:

import pandas as pd
import numpy as np

col1 = [np.random.randn()] * 10
col2 = [np.random.randn()] * 10
col3 = [np.random.randn()] * 10

df = pd.DataFrame({'col1': col1,
                   'col2': col2,
                   'col3': col3 })

df['col4'] = df.apply(lambda x: get_col4(x['col1'], x['col2']), axis=1)
df['col5'] = df.apply(lambda x: get_col5(x['col1'], x['col2'], x['col3']), 
axis=1)
df['col6'] = df.apply(lambda x: get_col6(x['col3'], x['col4'], x['col5']), 
axis=1)
df['col7'] = df.apply(lambda x: get_col7(x['col4'], x['col6']), axis=1)

where I have individual functions for each calculated column, each of which depend on some combination of the previous columns.

However, because the values of the calculated columns are dependent on each other, I think it would be much more efficient and elegant to use a function like the one below to calculate the new columns all at once:

def get_cols(col1, col2, col3):
    #some calculations...
    return {'col4': col4,
            'col5': col5,
            'col6': col6,
            'col7': col7}

Is there a way to do this using pandas?

Can you give example input and output for this? Even for just a representative column, potentially you don't need for all the columns you're trying to create (?). This looks like it could be an unnecessarily slow running way of approaching your problem. — roganjosh, Oct 18 '17 at 13:25

azizj · Answer 1 · 2017-10-18T16:06:08.763

Since you want to retain the previous columns, you can make a Series out of the new columns, and then append that new Series object to the original Series. Keep in mind that the input to get_cols is an individual row (and is thus a Series) from the original DataFrame.

import pandas as pd
import numpy as np

def get_cols(cols):
    col4 = cols[0] * 2
    col5 = cols[1] * 2
    col6 = cols[2] * 2
    return cols.append(pd.Series([col4, col5, col6], index=['col4', 'col5', 'col6']))

col1 = [np.random.randn()] * 10
col2 = [np.random.randn()] * 10
col3 = [np.random.randn()] * 10

df = pd.DataFrame({'col1': col1,
                   'col2': col2,
                   'col3': col3 })

df = df.apply(get_cols, axis=1)
print(df)

       col1      col2      col3      col4      col5      col6
0 -0.809803  0.522547  0.064061 -1.619606  1.045093  0.128122
1 -0.809803  0.522547  0.064061 -1.619606  1.045093  0.128122
2 -0.809803  0.522547  0.064061 -1.619606  1.045093  0.128122
3 -0.809803  0.522547  0.064061 -1.619606  1.045093  0.128122
4 -0.809803  0.522547  0.064061 -1.619606  1.045093  0.128122
5 -0.809803  0.522547  0.064061 -1.619606  1.045093  0.128122
6 -0.809803  0.522547  0.064061 -1.619606  1.045093  0.128122
7 -0.809803  0.522547  0.064061 -1.619606  1.045093  0.128122
8 -0.809803  0.522547  0.064061 -1.619606  1.045093  0.128122
9 -0.809803  0.522547  0.064061 -1.619606  1.045093  0.128122

score 1 · Answer 2 · answered Oct 18 '17 at 15:25

1

This might help you: pandas apply function that returns multiple values to rows in pandas dataframe

The right method is to return a list instead of a dictionary with your second function "get_cols" and then use apply.

answered Oct 18 '17 at 15:25

Rockbar

1,081
1
20
31

Python/pandas - Using DataFrame.apply with function returning dictionary

2 Answers2