0

I have a function that's intended to create and name a new data frame based on the field name that is being passed into the function.

Assuming the data frame df has the fields "date", "sales", and "orders". Once I run the function, I want to be able to set the data frame name to, for example, sales_trend, which would be the result of trend(df, "sales").

def trend(df, field_name):
    df_name = df.groupby('date')[field_name].mean().reset_index().sort_values(by='date', ascending=True)
    return (field_name + '_trend') = df_name

I'm clearly not doing this right. Any suggestions would be much appreciated.

Igor Raush
  • 15,080
  • 1
  • 34
  • 55
BlackHat
  • 736
  • 1
  • 10
  • 24

2 Answers2

2

In General functions don't return the name, instead it gives an object.You may refer the following posts regarding that.

  1. How to write a function to return the variable name in Python
  2. http://effbot.org/pyfaq/how-can-my-code-discover-the-name-of-an-object.htm

    I believe you are trying to implement the below code

    def trend(df,field_name):
         df_name = df.groupby('date')[field_name].mean().reset_index().sort_values(by='date', ascending=True)
         return (df_name)  
    
    
    mydic = {}
    field_name='Sample'
    
    mydic[field_name+'Trend'] = trend(df,field_name)
    print mydic['SampleTrend'] 
    
Community
  • 1
  • 1
Shijo
  • 9,313
  • 3
  • 19
  • 31
0

It is possible to dynamically add names to the global namespace by modifying globals(), but it is strongly discouraged. Use a dictionary instead (as described by Shijo).

Another approach is to aggregate all columns you need on the same GroupBy object. For example, given the following data frame

np.random.seed(0)

# generate fake data
date_range = pd.Series(pd.date_range('2017-01-01', periods=3))
df = pd.DataFrame({
    'date': pd.concat([date_range] * 2),
    'sales': np.random.normal(0, 1, 6),
    'orders': np.random.normal(0, 1, 6)
}).reset_index(drop=True)
print(df)
        date    orders     sales
0 2017-01-01  0.950088  1.764052
1 2017-01-02 -0.151357  0.400157
2 2017-01-03 -0.103219  0.978738
3 2017-01-01  0.410599  2.240893
4 2017-01-02  0.144044  1.867558
5 2017-01-03  1.454274 -0.977278

you can do

# the fields for which you want to compute trends
field_names = ['sales', 'orders']

# compute trends using a single GroupBy
trend = df.groupby('date', as_index=False)[field_names].mean().sort_values('date')
print(trend)
        date     sales    orders
0 2017-01-01  2.002473  0.680343
1 2017-01-02  1.133858 -0.003657
2 2017-01-03  0.000730  0.675527

Now you can use the resulting trend data frame similarly to a namespace. Where you had wanted to use the name sales_trend, you can instead use trend['sales'].

print(trend['sales'])
0    2.002473
1    1.133858
2    0.000730
Name: sales, dtype: float64
Community
  • 1
  • 1
Igor Raush
  • 15,080
  • 1
  • 34
  • 55