1

I use the following pandas expression

df = df[df.columns[~df.columns.str.contains('Unnamed:')]]

to drop columns that contain Unnamed. I got this one from here Remove Unnamed columns in pandas dataframe

For some reason, in some cases, this line causes an explosion of columns e.g

df shape in (2000, 1451)
after dropping Unnamed (2000, 3851)

in particular, it seems like it causes an explosion in case some columns have the same name e.g duplicates.

Anyone knows why this happens and how to avoid it?

How do I drop columns that have certain substring in duplicate-name-allowed case? Thanks

YohanRoth
  • 3,153
  • 4
  • 31
  • 58

2 Answers2

3

You're slicing with names of columns when you clearly have repeated names. You want to slice using loc and a boolean mask.

df = df.loc[:, ~df.columns.str.contains('Unnamed:')]]
piRSquared
  • 285,575
  • 57
  • 475
  • 624
1

I am recommended fixing the duplicated columns problem

s=df.columns.to_series()
s1=s.groupby(s).cumcount().astype(str)
newc=s+s1.mask(s1=='0','')
Out[717]: 
a     a
a    a1
b     b
dtype: object
df.columns=newc
BENY
  • 317,841
  • 20
  • 164
  • 234
  • 1
    @YohanRoth adding a name count if unique nothing change, if duplicated adding the the count number to make it unique – BENY Jun 24 '19 at 14:29