Merge dataframe with another dataframe created from apply function?

Question

I have a dataframe, df, of salary data:

State,Annual Salary
New York, 132826
New Hampshire,128704
California,127388
Vermont,121599
Idaho,120011

And a function, get_taxes_from_api that calls an API and returns tax numbers for an input State and Annual Salary as a data frame with only 1 row like this:

State,annual.fica.amount,annual.federal.amount,annual.state.amount
North Carolina,8918,40334,6364

it is of type: <class 'pandas.core.frame.DataFrame'> not series. I want to call the API on each row of df and then merge each of the resulting 1 row data frames. Something like:

State,Annual Salary,annual.fica.amount,annual.federal.amount,annual.state.amount
North Carolina, 116500,8918,40334,6364
New York, 132826, . . . 
New Hampshire,128704, . . . 
California,127388, . . . 
Vermont,121599, . . . 
Idaho,120011, . . .

How do I do this? I ran into an error that my lambda below created a Series instead of a dataframe so went down this rabbit hole of using result_type=expand like here 'https://stackoverflow.com/a/62849468/2415706' but it is still broken:

all_tax_df = df[['State','Annual Salary']].apply(lambda row: get_taxes_from_api(row['State'],row['Annual Salary']), axis=1, result_type='expand')


# merge all_tax_df with df on 'State'

SeaBean · Accepted Answer · 2021-04-03T19:17:31.087

You don't need to specify result_type for the .apply() call when the applied function inside .apply() returns a pandas Series. From the official document:

Returning a Series inside the function is similar to passing result_type='expand'. The resulting column names will be the Series index.

Moreover, for the default result_type=None:

The default behaviour (None) depends on the return value of the applied function: .... if the apply function returns a Series these are expanded to columns.

As your API (the applied function inside .apply() call) returns a 1-row DataFrame, you can readily convert it to a Series by the .squeeze() call, like below:

df[['State','Annual Salary']].apply(
    lambda row: get_taxes_from_api(row['State'], row['Annual Salary']).squeeze(), axis=1)

The result of this apply() call contains only the columns returned by the API call, which does not include the column Annual Salary in your original DataFrame. You can call .merge() to merge the original dataframe with the resulting dataframe returned from apply() to get your desired layout. You can do the 2 steps in one line:

all_tax_df = df.merge(df[['State','Annual Salary']].apply(
    lambda row: get_taxes_from_api(row['State'], row['Annual Salary']).squeeze(), axis=1))

Test Run

col = ['State','Annual Salary']
dat = [['New York', 132826], ['New Hampshire',128704], ['California',127388], ['Vermont',121599], ['Idaho',120011]]
df = pd.DataFrame(dat, columns=col)

def get_taxes_from_api(state, annual_salary):
    return pd.DataFrame({'State': [state], 
                         'annual.fica.amount': [int(annual_salary * 0.067)], 
                         'annual.federal.amount': [int(annual_salary * 0.3)], 
                         'annual.state.amount': [int(annual_salary * 0.048)]})


all_tax_df = df.merge(df[['State','Annual Salary']].apply(
    lambda row: get_taxes_from_api(row['State'], row['Annual Salary']).squeeze(), axis=1))



print(all_tax_df)




           State  Annual Salary  annual.fica.amount  annual.federal.amount  annual.state.amount
0       New York         132826                8899                  39847                 6375
1  New Hampshire         128704                8623                  38611                 6177
2     California         127388                8534                  38216                 6114
3        Vermont         121599                8147                  36479                 5836
4          Idaho         120011                8040                  36003                 5760

I tried what you have here fixing that typo, still doesn't work. get_taxes_from_api doesn't return a pd.Series, it returns a dataframe. Here's what I get when I write a test printing its type and what it returns: ` annual.fica.amount annual.federal.amount annual.state.amount State Initial State 0 7650 15103.5 0 TX Texas ` — user2415706, Apr 02 '21 at 18:27
@user2415706 See my edit above to cater for when DataFrame is returned by the api I have tested it with modified model api function. — SeaBean, Apr 02 '21 at 19:04
Ok thank you that modification that casts back into series works! — user2415706, Apr 02 '21 at 19:26
@user2415706 I have further simplified the codes by defining a custom function so that we don't need to call api twice in the lambda function. — SeaBean, Apr 02 '21 at 19:29
@user2415706 The the 1-row dataframe can readily be converted to a pandas Series by using `.squeeze()`. See my consolidated edit above. The codes can be much simplified and used as the final version. — SeaBean, Apr 03 '21 at 19:26
Thank you for this update, yeah it looks much nicer with squeeze(). — user2415706, Apr 08 '21 at 22:29

score 1 · Answer 2 · answered Apr 02 '21 at 07:19

You can do this simply by assigning directly the apply result to new columns :

df[["annual.fica.amount","annual.federal.amount","annual.state.amount"]]=( 
    df[['State','Annual Salary']].apply(lambda row: get_taxes_from_api(row['State'],row['Annual Salary']), axis=1)
)

Merging would be a good idea if you'd get the whole dataframe in one call to the api (which might be more efficient)

score 1 · Answer 3 · answered Apr 02 '21 at 07:21

1

If you really do not want the result as a series, you can easily convert it to DataFrame.

import pandas as pd    

all_tax_df = pd.DataFrame(df[['State','Annual Salary']].apply(lambda row: get_taxes_from_api(row['State'],row['Annual Salary']), axis=1, result_type='expand'))

answered Apr 02 '21 at 07:21

J.M. Robles

614
5
9

tried this, I get: `ValueError: If using all scalar values, you must pass an index`, get_taxes_from_api returns a dataframe already, I wrote a test and this is what that test outputs as the type and the output: ` annual.fica.amount annual.federal.amount annual.state.amount State Initial State 0 7650 15103.5 0 TX Texas` – user2415706 Apr 02 '21 at 18:32

Merge dataframe with another dataframe created from apply function?

3 Answers3

Test Run

Linked