Create and name a data frame based on two strings

Question

I have a function that's intended to create and name a new data frame based on the field name that is being passed into the function.

Assuming the data frame df has the fields "date", "sales", and "orders". Once I run the function, I want to be able to set the data frame name to, for example, sales_trend, which would be the result of trend(df, "sales").

def trend(df, field_name):
    df_name = df.groupby('date')[field_name].mean().reset_index().sort_values(by='date', ascending=True)
    return (field_name + '_trend') = df_name

I'm clearly not doing this right. Any suggestions would be much appreciated.

I would recommend using the name `"sales_trend"` as a key to a dictionary, where the dataframe is the value — Patrick Haugh, Jan 19 '17 at 16:21
Do you want the trend function to return a string or a dataframe? — denvaar, Jan 19 '17 at 16:36
I want it to return a dataframe but this dataframe should assume the name based on the field i pass. — BlackHat, Jan 19 '17 at 17:09

score 2 · Answer 1 · edited May 23 '17 at 12:08

In General functions don't return the name, instead it gives an object.You may refer the following posts regarding that.

How to write a function to return the variable name in Python

http://effbot.org/pyfaq/how-can-my-code-discover-the-name-of-an-object.htm

I believe you are trying to implement the below code

def trend(df,field_name):
     df_name = df.groupby('date')[field_name].mean().reset_index().sort_values(by='date', ascending=True)
     return (df_name)  


mydic = {}
field_name='Sample'

mydic[field_name+'Trend'] = trend(df,field_name)
print mydic['SampleTrend']

score 0 · Answer 2 · edited May 23 '17 at 12:16

It is possible to dynamically add names to the global namespace by modifying globals(), but it is strongly discouraged. Use a dictionary instead (as described by Shijo).

Another approach is to aggregate all columns you need on the same GroupBy object. For example, given the following data frame

np.random.seed(0)

# generate fake data
date_range = pd.Series(pd.date_range('2017-01-01', periods=3))
df = pd.DataFrame({
    'date': pd.concat([date_range] * 2),
    'sales': np.random.normal(0, 1, 6),
    'orders': np.random.normal(0, 1, 6)
}).reset_index(drop=True)
print(df)

        date    orders     sales
0 2017-01-01  0.950088  1.764052
1 2017-01-02 -0.151357  0.400157
2 2017-01-03 -0.103219  0.978738
3 2017-01-01  0.410599  2.240893
4 2017-01-02  0.144044  1.867558
5 2017-01-03  1.454274 -0.977278

you can do

# the fields for which you want to compute trends
field_names = ['sales', 'orders']

# compute trends using a single GroupBy
trend = df.groupby('date', as_index=False)[field_names].mean().sort_values('date')
print(trend)

        date     sales    orders
0 2017-01-01  2.002473  0.680343
1 2017-01-02  1.133858 -0.003657
2 2017-01-03  0.000730  0.675527

Now you can use the resulting trend data frame similarly to a namespace. Where you had wanted to use the name sales_trend, you can instead use trend['sales'].

print(trend['sales'])

0    2.002473
1    1.133858
2    0.000730
Name: sales, dtype: float64

Create and name a data frame based on two strings

2 Answers2