1

My question is related to this question:

Merge dataframe with another dataframe created from apply function?

Here is my version of code:

col = ['State','Annual Salary']
dat = [['New York', 132826], ['New Hampshire',128704], ['California',127388], ['Vermont',121599], ['Idaho',120011]]
df = pd.DataFrame(dat, columns=col)

def get_taxes_from_api(state, annual_salary):
    return pd.DataFrame({'State': [state, state], 
                         'annual.fica.amount': [int(annual_salary * 0.067),
                                                int(annual_salary * 1.067)], 
                         'annual.federal.amount': [int(annual_salary * 0.3),
                                                   int(annual_salary * 1.3)], 
                         'annual.state.amount': [int(annual_salary * 0.048),
                                                 int(annual_salary * 1.048)]})

How do I apply get_taxes_from_api to each row of df and combine the returned dataframes into on dataframe?

The only difference is that my function returns a multiple-row dataframe, not a 1-row dataframe. So the solution to that question above does not work for my situation. (And I don't have enought reputation to leave a comment there.)

GoldenYoyo
  • 55
  • 5

3 Answers3

1

This doesn't directly answer your question, but here's one way that doesn't use apply

col = ['State','Annual Salary']
dat = [['New York', 132826], ['New Hampshire',128704], ['California',127388], ['Vermont',121599], ['Idaho',120011]]
df = pd.DataFrame(dat, columns=col)

#Create the "first" row of each state from your function by adding columns
df['annual.fica.amount'] = df['Annual Salary'].multiply(0.067)
df['annual.federal.amount'] = df['Annual Salary'].multiply(0.3)
df['annual.state.amount'] = df['Annual Salary'].multiply(0.048)

#Create the "second" row of each state as a new df
cumulative_df = df.copy()
cumulative_df['annual.fica.amount'] += cumulative_df['Annual Salary']
cumulative_df['annual.federal.amount'] += cumulative_df['Annual Salary']
cumulative_df['annual.state.amount'] += cumulative_df['Annual Salary']

#Concatenate the two tables and sort so the states are right next to each other
final_df = pd.concat((df,cumulative_df)).sort_values('State').reset_index(drop=True)

Output

enter image description here

mitoRibo
  • 4,468
  • 1
  • 13
  • 22
1

You could use concat for the nested DataFrame

nested_df = df.apply(lambda x: get_taxes_from_api(x["State"],x["Annual Salary"]),axis=1)

result = pd.DataFrame()

for element in nested_df:
    result = pd.concat([result,element])

result:

print(result)
State annual.fica.amount annual.federal.amount annual.state.amount
0 New York 8899 39847 6375
1 New York 141725 172673 139201
0 New Hampshire 8623 38611 6177
1 New Hampshire 137327 167315 134881
0 California 8534 38216 6114
1 California 135922 165604 133502
0 Vermont 8147 36479 5836
1 Vermont 129746 158078 127435
0 Idaho 8040 36003 5760
1 Idaho 128051 156014 125771
Yefet
  • 2,010
  • 1
  • 10
  • 19
0

You can create a new join key among the two dfs and do pd.merge. See here:

df["df_merge_key"] = "#"
df_after_apply["df_merge_key"] = "#"
details_df = pd.merge(df, df_after_apply, how="left", on="df_merge_key").drop(labels=["df_merge_key"], axis=1)

This is simpler and neater in my opinion.

Anmol Deep
  • 463
  • 1
  • 5
  • 16