Renaming columns in a Pandas dataframe with duplicate column names?

Question

I have a df X with columns with duplicate names:

In [77]: X_R
Out[77]: 
      dollars  dollars
   0   0.7085   0.5000

I want to rename it so that I have:

In [77]: X_R
Out[77]: 
       Retail   Cost
   0   0.7085   0.5000

Using the Pandas rename function does' work:

X_R.rename(index=str, columns={"dollars": "Retail", "dollars": "Cost"})

Just gives me two columns named Cost.

How can I rename the columns in this case?

Just in case you didn't realize, `{"dollars": "Retail", "dollars": "Cost"}` becomes `{"dollars": "Cost"}` — wjandrea, Aug 27 '22 at 22:24

score 26 · Answer 1 · answered Nov 23 '16 at 22:12

26

Here is a dynamic solution:

In [59]: df
Out[59]:
   a  x  x  x  z
0  6  2  7  7  8
1  6  6  3  1  1
2  6  6  7  5  6
3  8  3  6  1  8
4  5  7  5  3  0

In [60]: d
Out[60]: {'x': ['x1', 'x2', 'x3']}

In [61]: df.rename(columns=lambda c: d[c].pop(0) if c in d.keys() else c)
Out[61]:
   a  x1  x2  x3  z
0  6   2   7   7  8
1  6   6   3   1  1
2  6   6   7   5  6
3  8   3   6   1  8
4  5   7   5   3  0

answered Nov 23 '16 at 22:12

MaxU - stand with Ukraine

205,989
36
386
419

I was wondering how to get the duplicated column names in pandas. Pandas only takes the last column, also if I read the data from file it gives me `x x.1 x.2`. – Dec 04 '18 at 01:10
@astro123, if you want to preserve duplicates in column names use `mangle_dupe_cols=False` parameter. `df.columns[df.columns.duplicated()]` - will give you duplicated column names – MaxU - stand with Ukraine Dec 04 '18 at 14:31

score 23 · Answer 2 · answered Sep 20 '17 at 22:36

Here is another dynamic solution that I think is nicer

In [59]: df
Out[59]:
   a  x  x  x  z
0  6  2  7  7  8
1  6  6  3  1  1
2  6  6  7  5  6
3  8  3  6  1  8
4  5  7  5  3  0
In [61]: class renamer():
             def __init__(self):
                  self.d = dict()

              def __call__(self, x):
                  if x not in self.d:
                      self.d[x] = 0
                      return x
                  else:
                      self.d[x] += 1
                      return "%s_%d" % (x, self.d[x])

          df.rename(columns=renamer())
Out[61]:
   a  x  x_1  x_2  z
0  6   2   7   7  8
1  6   6   3   1  1
2  6   6   7   5  6
3  8   3   6   1  8
4  5   7   5   3  0

thanks for this; i was able to use this to rename columns before doing a column drop... which apparently has trouble with duplicate column names? — beep_check, Oct 10 '19 at 20:10

score 17 · Accepted Answer · answered Nov 23 '16 at 22:10

17

X_R.columns = ['Retail','Cost']

answered Nov 23 '16 at 22:10

Mihkorz

824
7
6

score 7 · Answer 4 · answered Jan 06 '21 at 14:00

Not directly an answer, but since this a top search result, here is a short and flexible solution to append a suffix to duplicate column names:

# A dataframe with duplicated column names
df = pd.DataFrame([[1, 2, 3],
                   [4, 5, 6],
                   [7, 8, 9]])
df.columns = ['a', 'b', 'b']

# Columns to not rename
excluded = df.columns[~df.columns.duplicated(keep=False)]

# An incrementer
import itertools
inc = itertools.count().__next__

# A renamer
def ren(name):
    return f"{name}{inc()}" if name not in excluded else name

# Use inside rename()
df.rename(columns=ren)

    a   b   b              a  b0  b1
0   1   2   3          0   1   2   3
1   4   5   6    =>    1   4   5   6
2   7   8   8          2   7   8   9

Note that the suffix counter is shared, i.e. if you have multiple sets of duplicates they aren't handled independently. `a, a, b, b` becomes `a0, a1, b2, b3`. — wjakobw, Oct 11 '21 at 13:22
This is lovely! I used it a loop on a large dictionary of dataframes imported from very messy spreadsheets and it worked perfectly for my needs. Thank you! — Steph, Feb 18 '22 at 17:55

Benedictanjw · Answer 5 · 2020-09-01T03:38:51.813

MaxU's answer helped me with this same problem. In this answer, I add in a way to find those duplicated column headers.

First, we make a dictionary of the duplicated column names with values corresponding to the desired new column names. For this, the defaultdict subclass is required.

import pandas as pd
from collections import defaultdict

renamer = defaultdict()

We iterate over the duplicated column names to create a dictionary with keys being the duplicated column name and values being a list of new column names. I have chosen this list to be original name_0, original name_1, and so on.

for column_name in df.columns[df.columns.duplicated(keep=False)].tolist():
    if column_name not in renamer:
        renamer[column_name] = [column_name+'_0']
    else:
        renamer[column_name].append(column_name +'_'+str(len(renamer[column_name])))

print(renamer)
defaultdict(None, {'b': ['b_0', 'b_1', 'b_2', 'b_3'], 'c': ['c_0', 'c_1']})

Original dataframe:

print(df)
        a   b   b   b   b   c   c   d
Item 0  2   1   0   2   8   3   9   5
Item 1  3   2   7   3   5   4   6   2
Item 2  4   3   8   1   5   7   4   4
Item 3  5   5   3   6   0   5   2   5

Rename the duplicated columns by assigning the new names from our renamer defaultdict, leaving the unduplicated columns alone

df.rename(
    columns=lambda column_name: renamer[column_name].pop(0)
    if column_name in renamer 
    else column_name
)
        a   b_0 b_1 b_2 b_3 c_0 c_1 d
Item 0  2   1   0   2   8   3   9   5
Item 1  3   2   7   3   5   4   6   2
Item 2  4   3   8   1   5   7   4   4
Item 3  5   5   3   6   0   5   2   5

(As a sidenote, a couple of people have questioned why duplicated column names existed in the first place. For myself, I encountered duplicated column names when importing with the xlwings package (to deal with password-protected Excel files). You could also inadvertently create duplicate column names by using pd.concat.

Renaming columns in a Pandas dataframe with duplicate column names?

5 Answers5

Linked