EDIT Based on comments, clarifying the examples further to depict more realistic use case
I want to call a function with df.apply. This function returns multiple DataFrames. I want to join each of these DataFrames into logical groups. I am unable to do that without using for loop (which defeats the purpose of calling with apply).
I have tried calling function for each row of dataframe and it is slower than apply. However, with apply combining the results slows down things again.
Any tips?
# input data frame
data = {'Name':['Ani','Bob','Cal','Dom'], 'Age': [15,12,13,14], 'Score': [93,98,95,99]}
df_in=pd.DataFrame(data)
print(df_in)
Output>
Name Age Score
0 Ani 15 93
1 Bob 12 98
2 Cal 13 95
3 Dom 14 99
Function to be applied>
def func1(name, age):
num_rows = np.random.randint(int(age/3))
age_mul_1 = np.random.randint(low=1, high=age, size = num_rows)
age_mul_2 = np.random.randint(low=1, high=age, size = num_rows)
data = {'Name': [name]*num_rows, 'Age_Mul_1': age_mul_1, 'Age_Mul_2': age_mul_2}
df_func1 = pd.DataFrame(data)
return df_func1
def func2(name, age, score, other_params):
num_rows = np.random.randint(int(score/10))
score_mul_1 = np.random.randint(low=age, high=score, size = num_rows)
data2 = {'Name': [name]*num_rows, 'score_Mul_1': score_mul_1}
df_func2 = pd.DataFrame(data2)
return df_func2
def ret_mul_df(row):
df_A = func1(row['Name'], row['Age'])
#print(df_A)
df_B = func2(row['Name'], row['Age'], row['Score'],1)
#print(df_B)
return df_A, df_B
What I want to do is essentially create is two dataframes df_A_combined and df_B_combined
However, How I am currently combining is as follows:
df_out = df_in.apply(lambda row: ret_mul_df(row), axis=1)
df_A_combined = pd.DataFrame()
df_B_combined = pd.DataFrame()
for ser in df_out:
df_A_combined = df_A_combined.append(ser[0], ignore_index=True)
df_B_combined = df_B_combined.append(ser[1], ignore_index=True)
print(df_A_combined)
Name Age_Mul_1 Age_Mul_2
0 Ani 7 8
1 Ani 1 4
2 Ani 1 8
3 Ani 12 6
4 Bob 9 8
5 Cal 8 7
6 Cal 8 1
7 Cal 4 8
print(df_B_combined)
Name score_Mul_1
0 Ani 28
1 Ani 29
2 Ani 50
3 Ani 35
4 Ani 84
5 Ani 24
6 Ani 51
7 Ani 28
8 Bob 32
9 Cal 26
10 Cal 70
11 Dom 56
12 Dom 53
How can I avoid the iteration?
The func1, func2 are calls to 3rd party libraries (which are very computation intensive) and several such calls are made. Also dataframes df_A_combined and df_B_combined are not combinable among themselves
Note: This is a much simplified example and splitting the function will lead to lot of redundancies.