9

I have a dataframe df

 df = pd.DataFrame({'A':['-a',1,'a'], 
               'B':['a',np.nan,'c'],
               'ID':[1,2,2],
                't':[pd.tslib.Timestamp.now(),pd.tslib.Timestamp.now(),
                    np.nan]})

Added a new column

df['YearMonth'] = df['t'].map(lambda x: 100*x.year + x.month)

Now I want to write a function or macro which will do date comparasion, create a new dataframe also add a new column to dataframe.

I tried like this but seems I am going wrong:

def test(df,ym):
    df_new=df
    if(ym <= df['YearMonth']):
        df_new+"_"+ym=df_new
        return df_new+"_"+ym
    df_new+"_"+ym['new_col']=ym

Now when I call test function I want a new dataframe should get created named as df_new_201612 and this new dataframe should have one more column, named as new_col that has value of ym for all the rows.

test(df,201612)

The output of new dataframe is:

df_new_201612

A   B   ID  t                           YearMonth   new_col
-a  a   1   2016-12-05 12:37:56.374620  201612      201612 
1   NaN 2   2016-12-05 12:37:56.374644  201208      201612 
a   c   2   nat                         nan         201612 
user07
  • 658
  • 3
  • 13
  • 27
  • Your code isn't valid python - the line `df_new+"new"+ym['new_col']=ym` throws a `SnytaxError`. Also, I don't think `return df_new+"_"+ym` does what you think it does. – deepbrook Dec 05 '16 at 12:08
  • i know i am doing something wrong. Please let me know if you get some idea to implement above in pandas – user07 Dec 05 '16 at 12:44
  • does any one know how to deal with nan ... below solution is working if i do not have any nan value in YearMonth. How to get it done if we have nan too ? – user07 Dec 05 '16 at 16:26
  • `df.dropna()` does that for you - [check the pandas docs for more](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.dropna.html) – deepbrook Dec 06 '16 at 06:01

2 Answers2

24

Creating variables with dynamic names is typically a bad practice.

I think the best solution for your problem is to store your dataframes into a dictionary and dynamically generate the name of the key to access each dataframe.

import copy

dict_of_df = {}
for ym in [201511, 201612, 201710]:

    key_name = 'df_new_'+str(ym)    

    dict_of_df[key_name] = copy.deepcopy(df)

    to_change = df['YearMonth']< ym
    dict_of_df[key_name].loc[to_change, 'new_col'] = ym   

dict_of_df.keys()
Out[36]: ['df_new_201710', 'df_new_201612', 'df_new_201511']

dict_of_df
Out[37]: 
{'df_new_201511':     A    B  ID                       t  YearMonth  new_col
 0  -a    a   1 2016-12-05 07:53:35.943     201612   201612
 1   1  NaN   2 2016-12-05 07:53:35.943     201612   201612
 2   a    c   2 2016-12-05 07:53:35.943     201612   201612,
 'df_new_201612':     A    B  ID                       t  YearMonth  new_col
 0  -a    a   1 2016-12-05 07:53:35.943     201612   201612
 1   1  NaN   2 2016-12-05 07:53:35.943     201612   201612
 2   a    c   2 2016-12-05 07:53:35.943     201612   201612,
 'df_new_201710':     A    B  ID                       t  YearMonth  new_col
 0  -a    a   1 2016-12-05 07:53:35.943     201612   201710
 1   1  NaN   2 2016-12-05 07:53:35.943     201612   201710
 2   a    c   2 2016-12-05 07:53:35.943     201612   201710}

 # Extract a single dataframe
 df_2015 = dict_of_df['df_new_201511']
FLab
  • 7,136
  • 5
  • 36
  • 69
  • i did not understood. My requirement is to call test function with many yearmonth values and generate seperate dataframe of that yearmonth.it would be helpful if you can explain me with example what exaclty you are trying to say – user07 Dec 05 '16 at 12:55
  • Is creating dynamically named variables even possible in python? I've tried it with anaconda3, but I get `SyntaxErrors` left and right? – deepbrook Dec 05 '16 at 12:55
  • Added an example to clarify – FLab Dec 05 '16 at 13:04
  • thanks for the example got what you were trying to say.... one more doubt how i can access df_new_201511 as a separate dataframe ? . As i will be using these dict dataframes for futhur processing – user07 Dec 05 '16 at 13:17
  • thanks a lot Flab... now i can resolve my problem it seems – user07 Dec 05 '16 at 13:21
  • I added an extra line in the example. You can access every dataframe using the key name in square brackets, just like in any other python dictionary: dict_of_df[key_name] – FLab Dec 05 '16 at 13:21
  • This is a question different from the one asked. Anyway, when you drop duplicates how do you decide which one to keep? – FLab Dec 05 '16 at 14:31
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/129812/discussion-between-user07-and-flab). – user07 Dec 05 '16 at 14:46
  • how to deal with nan if one of row in YearMonth col contains null ? any idea ? – user07 Dec 05 '16 at 15:35
0

There is a more easy way to accomplish this using exec method. The following steps can be done to create a dataframe at runtime.

1.Create the source dataframe with some random values.

import numpy as np
import pandas as pd
    
df = pd.DataFrame({'A':['-a',1,'a'], 
                   'B':['a',np.nan,'c'],
                   'ID':[1,2,2]})

2.Assign a variable that holds the new dataframe name. You can even send this value as a parameter or loop it dynamically.

new_df_name = 'df_201612'

3.Create dataframe dynamically using exec method to copy data from source dataframe to the new dataframe dynamically and in the next line assign a value to new column.

exec(f'{new_df_name} = df.copy()')
exec(f'{new_df_name}["new_col"] = 123') 

4.Now the dataframe df_201612 will be available on the memory and you can execute print statement along with eval to verify this.

print(eval(new_df_name))
Sarath Subramanian
  • 20,027
  • 11
  • 82
  • 86