I am trying to create a function to decrease the amount of my coding. The function uses exec() on a Pandas data frame to do some calculations. But when I put the code inside exec() it returns NaNs. Here is the function:
def FeatureMaker(data, criteria_col, f_data):
source = data.copy()
for key, value in f_data.items():
for i in source[criteria_col].unique():
source.set_value(source[source[criteria_col] == i].index.tolist(), key, exec('source[source[criteria_col] == i]["' + value[0] + '"].' + value[1] + '()'))
return source
The function is supposed to get a data frame, find the unique items of a column which has been passed as the column holding the criteria (in this case the 'id' column), and calculate what has been passed to it. In this function, argument data holds the data frame, criteria_col holds the name of a column which has the values of the criteria, and f_data is a dictionary whose keys are the names of the new columns and its values the column on which the calculation has to be performed and the calculation itself.This is an example of how I run the function:
FeatureMaker(spend_alpha, 'id', {'total_sum': ["spend", "sum"]})
In this example, spend_alpha is the name of my data frame, id is my criteria column, and I want to create the total_sum column by calculating the sum of the spend column.
The function runs perfectly. However, it returns nothing but NaNs. I tried the code below:for i in spend_alpha['id'].unique():
spend_alpha.set_value(spend_alpha[spend_alpha['id'] == i].index.tolist(), 'total_sum', spend_alpha[spend_alpha['id'] == i]['spend'].sum())
And it works without a problem. I get the total sum of each id. However, this code returns NaNs as well:
for i in spend_alpha['id'].unique():
spend_alpha.set_value(spend_alpha[spend_alpha['id'] == i].index.tolist(), 'total_sum', exec("spend_alpha[spend_alpha['id'] == i]['spend'].sum()"))
My question is, how can I use exec() on a Pandas data frame without getting NaNs? Thank you in advance.
Edit:
Reproducible example: Lets say I create the spend_alpha data frame as follows...In [1]: import pandas as pd
In [2]: spend_alpha = pd.DataFrame([[100, 250],
...: [101, 50],
...: [102, 60],
...: [100, 50],
...: [102, 30],
...: [101, 50]], columns=['id', 'spend'])
In [3]: spend_alpha
Out[3]:
id spend
0 100 250
1 101 50
2 102 60
3 100 50
4 102 30
5 101 50
I can add a new column called total_sum by running the code below...
In [4]: for i in spend_alpha['id'].unique():
...: spend_alpha.set_value(spend_alpha[spend_alpha['id'] == i].index.tolist(), 'total_sum', spend_alpha[spend_alpha['id'] == i]['spend'].sum())
...:
In [5]: spend_alpha
Out[5]:
id spend total_sum
0 100 250 300.0
1 101 50 100.0
2 102 60 90.0
3 100 50 300.0
4 102 30 90.0
5 101 50 100.0
However, if I place the third argument of the set_value() function inside exec() it well return NaNs. As shown below:
In [6]: for i in spend_alpha['id'].unique():
...: spend_alpha.set_value(spend_alpha[spend_alpha['id'] == i].index.tolist(), 'total_sum', exec("spend_alpha[spend_alpha['id'] == i]['spend'].sum()"))
...:
In [7]: spend_alpha
Out[7]:
id spend total_sum
0 100 250 NaN
1 101 50 NaN
2 102 60 NaN
3 100 50 NaN
4 102 30 NaN
5 101 50 NaN
I want to be able to place the third argument inside exec() so that I can use the line within a function. This well let me calculate other aggregations other than sum() by passing it to the function. But it is not possible since exec() returns NaNs.
Likewise, the function has to calculate the sum of column spend per unique items of column id from the spend_alpha data frame and save it in a column called total_sum. But it returns NaNs.In [8]: def FeatureMaker(data, criteria_col, f_data):
...: source = data.copy()
...: for key, value in f_data.items():
...: for i in source[criteria_col].unique():
...: source.set_value(source[source[criteria_col] == i].index.tolist(), key, exec('source[source[criteria_col] == i]["' + value[0] + '"].' + value[1] + '()'))
...: return source
In [9]: FeatureMaker(spend_alpha, 'id', {'total_sum': ["spend", "sum"]})
Out[9]:
id spend total_sum
0 100 250 NaN
1 101 50 NaN
2 102 60 NaN
3 100 50 NaN
4 102 30 NaN
5 101 50 NaN
I would be really thankful if anyone can help me overcome this problem.