10

I'm not sure if this is a dumb way to go about things, but I've got several data frames, all of which have identical columns. I need to rename the columns within each to reflect the names of each data frame (I'll be performing an outer merge of all of these afterwards).

Let's say the data frames are called df1, df2 and df3, and each contains the columns name, date, and count.

I'd like to rename each of the columns in df1 into name_df1, date_df1, and count_df1.

I've written a function to rename the columns, thus:

df_list=[df1, df2, df3]

def rename_cols():
    col_name="name"+suffix
    col_count="count"+suffix
    col_date="date"+suffix

for x in df_list:
    if x['name'].tail(1).item() == df1['name'].tail(1).item():
        suffix="_"+"df1"
        rename_cols()
        continue
    elif x['name'].tail(1).item() == df2['name'].tail(1).item():
        suffix="_"+"df2"
        rename_cols()
        continue
    else:
        suffix="_"+"df3"
        rename_cols()

    col_names=[col_name,col_date,col_count]
    x.columns=col_names

Unfortunately, I get the following error: KeyError: 'name'

I'm really struggling to figure out why that's going on. The columns for df1, the first data frame in the df_list, gets renamed. Everything else stays the same... Am I messing up basic syntax (probably), or is there a fundamental misunderstanding that I've got of how things should work?

From what I can ascertain, the first data frame in the list is being iterated through more than once — but why would that be the case?

scrollex
  • 2,575
  • 7
  • 24
  • 38

5 Answers5

13

I guess you can achieve this with something simplier, like that :

df_list=[df1, df2, df3]
for i, df in enumerate(df_list, 1):
    df.columns = [col_name+'_df{}'.format(i) for col_name in df.columns]

If your DataFrames have prettier names you can try:

df_names=('Home', 'Work', 'Park')
for df_name in df_names:
    df = globals()[df_name]
    df.columns = [col_name+'_{}'.format(df_name) for col_name in df.columns]

Or you can fetch the name of each variable by looking up into globals() (or locals()) :

df_list = [Home, Work, Park]
for df in df_list:
    name = [k for k, v in globals().items() if id(v) == id(df) and k[0] != '_'][0]
    df.columns = [col_name+'_{}'.format(name) for col_name in df.columns]
mgc
  • 5,223
  • 1
  • 24
  • 37
  • Is there a way to do this without generating a `df+x` type of name? I've simplified things here, but in my scenario I'd need to create names like "Home", "Work", etc. Otherwise, this is terrific. – scrollex Jan 17 '16 at 21:46
  • I'm not sure to understand. Do you mean a 'Home' suffix for the columns of the first DataFrame, then a 'Work' prefix for the second, etc. ? – mgc Jan 17 '16 at 21:53
  • Yup, exactly. How would it be possible to do the same if `df_list=[Home, Work, Park]`? – scrollex Jan 17 '16 at 21:55
  • @scrollex I edited my answer (but i doesn't fit exactly your scenario as i'm using directly the name of the DataFrames). – mgc Jan 17 '16 at 22:11
  • No, that's great. Thank you! – scrollex Jan 17 '16 at 22:19
11

My preferred rather simple way of doing this, especially when you want to apply some logic to all column names is:

for col in df.columns:
    df.rename(columns={col:col.upper().replace(" ","_")},inplace=True)
Hisham Sajid
  • 191
  • 2
  • 10
4

I'll suppose that you have your stored in a dictionary as this is the idiomatic way of storing a series of named objects in Python. The idiomatic pandas way of changing your column names is to use a vectorised string operation on df.columns:

df_dict = {"df1":df1, "df2":df2, "df3":df3}
for name, df in df_dict.items():
   df.columns = df.columns + "_" + name

Another option to consider is adding the suffixes automatically during the merge. When you call merge you can specify the suffixes that will be appended to duplicate column names with the suffixes parameter. If you just want to append the names of the dataframes, you can call it like this. :

from itertools import reduce
df_merged = reduce(lambda x,y: ("df_merged", 
                               x[1].merge(y[1], left_index=True, right_index=True, 
                                         suffixes = ("","_"+y[0]))),
                   df_dict.items())[1]
maxymoo
  • 35,286
  • 11
  • 92
  • 119
3

For completeness, since nobody has mentioned df.rename, see Andy Hayden's answer here:

Renaming columns in pandas

df.rename can take a function as an argument, so in this case:

df_dict = {'df1':df1,'df2':df2,'df3':df3}
for name,df in df_dict.items():
    df.rename(lambda x: x+'_'+name, inplace=True)
Community
  • 1
  • 1
majr
  • 263
  • 1
  • 6
0

A more simple way

Get total length from cursor.description Then convert it into list Apply the list directly into DF

num_fields = len(cursor.description)
field_names = [ i[0] for i in cursor.description ]
df.columns = field_names
W Kenny
  • 1,855
  • 22
  • 33