3

I want to select columns which contain non-duplicate from a pandas data frame and use these columns to make up a subset data frame. For example, I have a data frame like this:

   x  y  z
a  1  2  3
b  1  2  2
c  1  2  3
d  4  2  3

The columns "x" and "z" have non-duplicate values, so I want to pick them out and create a new data frame like:

   x  z
a  1  3
b  1  2
c  1  3
d  4  3

The can be realized by the following code:

import pandas as pd
df = pd.DataFrame([[1,2,3],[1,2,2],[1,2,3],[4,2,3]],index=['a','b','c','d'],columns=['x','y','z'])
df0 = pd.DataFrame()
for i in range(df.shape[1]):
    if df.iloc[:,i].nunique() > 1:
        df1 = df.iloc[:,i].T
        df0 = pd.concat([df0,df1],axis=1, sort=False)

However, there must be more simple and direct methods. What are they?

Best regards

Alex
  • 6,610
  • 3
  • 20
  • 38
Yeping Sun
  • 405
  • 1
  • 6
  • 18
  • 1
    When you are saying "non-duplicate values", do you mean that the column does not cointain the same value for every row? – impulso Mar 21 '19 at 16:51
  • Possible duplicate of [python panda remove duplicate columns](https://stackoverflow.com/questions/14984119/python-pandas-remove-duplicate-columns) – Clement Lombard Mar 21 '19 at 16:54

4 Answers4

4
df[df.columns[(df.nunique()!=1).values]]

Maybe you can try this one-liner.

Justice_Lords
  • 949
  • 5
  • 14
3

Apply nunique, then remove columns where nunique is 1:

nunique = df.apply(pd.Series.nunique)
cols_to_drop = nunique[nunique == 1].index
df = df.drop(cols_to_drop, axis=1)
Alex
  • 6,610
  • 3
  • 20
  • 38
1
df =df[df.columns[df.nunique()>1]]

assuming columns with all repeated values with give nunique =1 other will be more 1. df.columns[df.nunique()>1] will give all columns names which fulfill the purpose

Akhilesh_IN
  • 1,217
  • 1
  • 13
  • 19
0

simple one liner:

df0 = df.loc[:,(df.max()-df.min())!=0]

or even better

df0 = df.loc[:,(df.max()!=df.min())]

Lior Cohen
  • 5,570
  • 2
  • 14
  • 30