4

I have pandas DataFrame with 2 rows:

+---------+---------+-----------+------------+
|       ID|     Type|      Index|        Code|
+---------+---------+-----------+------------+
|111111111|       aa|          1|         XXX|
|111111111|       aa|       null|         XXX|

How can I show those columns that have different values between rows of df? (there might be more than 2 rows).

In the above-shown example, the expected output is Index, because the first row has Index equal to 1 and the second row has Index equal to null.

Fluxy
  • 2,838
  • 6
  • 34
  • 63
  • 1
    Are you looking for `nunique`? `df.columns[df.nunique(axis=0, dropna=False).gt(1)] ` – cs95 Aug 05 '19 at 19:58
  • Are you trying to compare values across every row? Or consecutive rows only? Are you claiming that every row should have the same column values (with exceptions)? – cs95 Aug 05 '19 at 20:01
  • 1
    @cs95: Across all existing rows in `df`. – Fluxy Aug 05 '19 at 20:08
  • 1
    So, can you confirm that `df.columns[df.nunique(axis=0, dropna=False).gt(1)]` is what you want? Or maybe you want to do this per ID as rafaelc mentioned. – cs95 Aug 05 '19 at 20:09

4 Answers4

2

You can simply select the column/s (or the dataframe) with columns having unique values more than 1. You can get those columns whose values are distinct with

def cols_having_unique(df):
    my_cols = []
    for col in df.columns:
        if df[col].nunique(dropna=False) > 1:
            my_cols.append(col)
    return df[my_cols].copy()

new_df = cols_having_unique(df)
null
  • 1,944
  • 1
  • 14
  • 24
1

You can use this code:

for col in df.columns:
   if len(df[col].value_counts()) > 1:
      print('The column ' +  col + ' has different values')
Noha Elprince
  • 1,924
  • 1
  • 16
  • 10
1

Loop through pandas.DataFrame.columns and check if two rows have the same value for every column.

import pandas as pd
data=[
    [111111111,'aa',1,'XXX'],
    [111111111,'aa','null','XXX'],
    [111111111,'aa','null','YYY']
]
df=pd.DataFrame(data, columns=["ID", "Type", "Index", "Code"])

def get_different_columns(r1, r2):
    different_col_list=[]
    for column in df.columns:
        if df.iloc[r1][column] != df.iloc[r2][column]:
            different_col_list.append(column)
    return different_col_list

print(get_different_columns(0,1))
#['Index']

print(get_different_columns(1,2))
#['Code']

print(get_different_columns(0,2))
#['Index', 'Code']
Bitto
  • 7,937
  • 1
  • 16
  • 38
-3

I suppose you can count distinct values in a single column by iterating through all the columns.

How to count distinct values: Count unique values with pandas per groups

Peiqin
  • 378
  • 4
  • 12
  • 2
    If this is a duplicate, please vote to close as duplicate or leave a comment. **Do not** make an answer that just links to the duplicate. – user3483203 Aug 05 '19 at 20:00