How to select columns which contain non-duplicate from a pandas data frame

Question

I want to select columns which contain non-duplicate from a pandas data frame and use these columns to make up a subset data frame. For example, I have a data frame like this:

The columns "x" and "z" have non-duplicate values, so I want to pick them out and create a new data frame like:

The can be realized by the following code:

import pandas as pd
df = pd.DataFrame([[1,2,3],[1,2,2],[1,2,3],[4,2,3]],index=['a','b','c','d'],columns=['x','y','z'])
df0 = pd.DataFrame()
for i in range(df.shape[1]):
    if df.iloc[:,i].nunique() > 1:
        df1 = df.iloc[:,i].T
        df0 = pd.concat([df0,df1],axis=1, sort=False)

However, there must be more simple and direct methods. What are they?

Best regards

When you are saying "non-duplicate values", do you mean that the column does not cointain the same value for every row? — impulso, Mar 21 '19 at 16:51
Possible duplicate of [python panda remove duplicate columns](https://stackoverflow.com/questions/14984119/python-pandas-remove-duplicate-columns) — Clement Lombard, Mar 21 '19 at 16:54

score 4 · Accepted Answer · answered Mar 21 '19 at 17:04

4

df[df.columns[(df.nunique()!=1).values]]

Maybe you can try this one-liner.

answered Mar 21 '19 at 17:04

Justice_Lords

949
5
14

score 3 · Answer 2 · answered Mar 21 '19 at 16:51

3

Apply nunique, then remove columns where nunique is 1:

nunique = df.apply(pd.Series.nunique)
cols_to_drop = nunique[nunique == 1].index
df = df.drop(cols_to_drop, axis=1)

answered Mar 21 '19 at 16:51

Alex

6,610
3
20
38

Akhilesh_IN · Answer 3 · 2019-03-21T18:45:31.997

1

df =df[df.columns[df.nunique()>1]]

assuming columns with all repeated values with give nunique =1 other will be more 1. df.columns[df.nunique()>1] will give all columns names which fulfill the purpose

edited Mar 21 '19 at 18:45

answered Mar 21 '19 at 17:22

Akhilesh_IN

1,217
1
13
19

score 0 · Answer 4 · answered Mar 21 '19 at 16:56

0

simple one liner:

df0 = df.loc[:,(df.max()-df.min())!=0]

or even better

df0 = df.loc[:,(df.max()!=df.min())]

answered Mar 21 '19 at 16:56

Lior Cohen

5,570
2
14
30

How to select columns which contain non-duplicate from a pandas data frame

4 Answers4