Create a new column filled by a variable name in python

Question

Let's supposed I have a dataframe:

df = quandl.get("FRED/DEXBZUS")

The output would be:

print(df)

    Year    Value
1995-01-02  0.8440
1995-01-03  0.8450
1995-01-04  0.8450
1995-01-05  0.8430
1995-01-06  0.8400
1995-01-09  0.8440
1995-01-10  0.8470
1995-01-11  0.8510

I am trying to create a new column, filled by the variable name:

print(df)

    Year     Value  Variable
1995-01-02  0.8440    df
1995-01-03  0.8450    df
1995-01-04  0.8450    df
1995-01-05  0.8430    df
1995-01-06  0.8400    df
1995-01-09  0.8440    df
1995-01-10  0.8470    df
1995-01-11  0.8510    df

I would like to do this, in a loop process, using two differents dataframes:

df = quandl.get("FRED/DEXBZUS")
df2 = quandl.get("FRED/DEXBZUS")

data = [df, df2]

for i in data:

dps = []

for i in df:
        d = i.reset_index()
        d = pd.DataFrame(d)
        d['variable'] = [i]

But I didn't get the variables name inside the columns.

It should be like this :

    Year     Value  Variable
1995-01-02  0.8440    df
1995-01-03  0.8450    df
1995-01-04  0.8450    df
1995-01-05  0.8430    df
1995-01-06  0.8400    df
1995-01-09  0.8440    df
1995-01-10  0.8470    df
1995-01-11  0.8510    df


2008-01-02  0.8440    df2
2008-01-03  0.8450    df2
2008-01-04  0.8450    df2
2008-01-05  0.8430    df2
2008-01-06  0.8400    df2
2008-01-09  0.8440    df2
2008-01-10  0.8470    df2
2008-01-11  0.8510    df2

Assuming the identation is not correct, the second loop `for` use the same variable `i` for iteration, this should be change — Ben.T, May 29 '18 at 14:30
Why do you have separate *df* variables in the first place? Import or source your dataframes inside a dictionary with variable name as key: `df_dict['df'] = quandl.get("FRED/DEXBZUS")`. — Parfait, May 29 '18 at 17:31

Ashish Acharya · Answer 1 · 2018-05-29T14:55:30.277

Not sure if this is the best way to do it, but it works:

In [56]: df_list = []
    ...: for i in locals():
    ...:     try:
    ...:         if type(locals()[i]) == pd.core.frame.DataFrame and not i.startswith('_'):
    ...:             df_list.append(i)            
    ...:     except KeyError:
    ...:         pass  

In [57]: df_list
Out[57]: ['df', 'df2']

In [58]: for d in df_list:
    ...:     locals()[d]['Variable'] = d

In [59]: df
Out[59]: 
         Year  Value Variable
0  1995-01-02  0.844       df
1  1995-01-03  0.845       df
2  1995-01-04  0.845       df
3  1995-01-05  0.843       df
4  1995-01-06  0.840       df
5  1995-01-09  0.844       df
6  1995-01-10  0.847       df
7  1995-01-11  0.851       df

In [60]: df2
Out[60]: 
         Year  Value Variable
0  2008-01-02  0.844      df2
1  2008-01-03  0.845      df2
2  2008-01-04  0.845      df2
3  2008-01-05  0.843      df2
4  2008-01-06  0.840      df2
5  2008-01-09  0.844      df2
6  2008-01-10  0.847      df2
7  2008-01-11  0.851      df2

score 0 · Accepted Answer · answered May 29 '18 at 15:05

To get the name of a variable, we can use the code from this answer, copied below:

import inspect


def retrieve_name(var):
        """
        Gets the name of var. Does it from the out most frame inner-wards.
        :param var: variable to get name from.
        :return: string
        """
        for fi in reversed(inspect.stack()):
            names = [var_name for var_name, var_val in fi.frame.f_locals.items() if var_val is var]
            if len(names) > 0:
                return names[0]

The problem with this is that it won't work when looping through say a list, because you will simply get the name of the local variable. This has to do with how variable names work in python. A variable points to an object, i.e. a place in memory but the place in memory does not point back. That means given an object, you cannot really determine its name. The same is true for containers like lists. If you for example have a list l that contains two objects a and b l=[a,b], the list does not actually save the names of the variables a and b. Instead when you create the list, it records the place in memory that a and b are pointing to, i.e. the objects rather than the names.

d = 'a'
print(retrieve_name(d))
#'d'
l = [d, d]
print([retrieve_name(element) for element in list ])
#['element', 'element']

That being said, if you have a dictionary of names and objects, you can do what you asked for:

name_dict = {'df': df, 'df2':df2}
dfs = [frame.assign(Variable=name) for name, frame in name_dict.items()]
combined_df = pd.concat(dfs)

However, if your DataFrames actually all have different Data Sources, then there is an easier way of doing all this. I often face this problem of having data in several different sources and their names are for example file names. Let's say I have a couple of .csv files from which I am reading data and I want to combine them all into a pd.DataFrame but want each row to remember from which file it came.

import pandas as pd
#Let's make our two fake csv files a and b:
with open('a.csv', mode='w') as a, open('b.csv', mode='w') as b:
     a.write('col1,col2\n1,1')
     b.write('col1,col2\n2,2')

csv_files = ['a.csv', 'b.csv']
dfs = [pd.read_csv(csv_file).assign(filename=csv_file) for csv_file in csv_files] 
#assign let's you assign the value of a column and returns a DataFrame, so it's 
#great for list comprehensions, in which the df['some_col']='some_var'
#syntax does not work

combined_ab = pd.concat(dfs)
combined_ab
#   col1  col2 filename
#0     1     1    a.csv
#0     2     2    b.csv

Create a new column filled by a variable name in python

2 Answers2