How can I keep all the rows in my dataframe that have the same values in all columns?

Question

I want to find all the rows that have the same values across all columns, or more specifically, drop all the rows that have difference. I was thinking I would iterate through each column and check for the greatest majority and remove the other rows. I feel like this method is not the best way to utilize a dataframe.

I was checking previous posts and someone had a similar question but they wanted the inverse result so I will use their example:

Input:

index  A  B  C  D  E  F ....
 0     1  2  3  4  2  2
 1     1  2  3  4  2  2
 2     5  5  5  5  5  5 
 3     7  7  6  7  7  7

Desired Output:

index  A  B  C  D  E  F ....
 0     1  2  3  4  2  2
 1     1  2  3  4  2  2

There can be many columns here.

Edit: The example I provided is wrong. Sorry I was tired. I have updated the example. Now that I have typed out my question I think I am basically just asking how do I find the rows that are all the same. If there are multiple groups of identical rows I want to know those as well.

Which post are you referring to? I found this one which is almost what you need: https://stackoverflow.com/questions/22701799/pandas-dataframe-find-rows-where-all-columns-equal — panktijk, Feb 12 '19 at 22:26
@panktijk I don't want all columns equal. I realize now that the example I used is not correct. I want all rows equal. I will edit my post. — nchuang, Feb 13 '19 at 15:06

BENY · Accepted Answer · 2019-02-13T15:44:07.183

2

Using nunique

yourdf=df[df.nunique(1)==1]
yourdf
       A  B  C  D  E  F
index                  
1      2  2  2  2  2  2
2      5  5  5  5  5  5

Update

df[df.duplicated(keep=False)]
Out[11]: 
       A  B  C  D  E  F
index                  
0      1  2  3  4  2  2
1      1  2  3  4  2  2

edited Feb 13 '19 at 15:44

answered Feb 12 '19 at 22:27

BENY

317,841
20
164
234

I am sorry about my OP, I meant I wanted all the rows that were the same. – nchuang Feb 13 '19 at 15:32
this works, but is there a way to sort it so I know which groups belong together? this only gives me Boolean if there is a duplicate – nchuang Feb 13 '19 at 15:50
@nchuang You can sort on all columns: `df[df.duplicated(keep=False)].sort_values(list(df.columns))` – panktijk Feb 13 '19 at 19:16
@ after filtered, `df.groupby(list(df))).ngroup()`, then you get the group number here , – BENY Feb 13 '19 at 19:19
`sort_values()` works but the `df.groupby(list(df))).ngroup()` gives every row -1? – nchuang Feb 13 '19 at 19:54
I tested it again on a much smaller set and it works. I guess it is my larger set, which I don't understand because I literally copied one of the rows 3x and `ngroup()` called them 1,2,3 instead of 1,1,1. – nchuang Feb 13 '19 at 21:13
thank you it does work. My original dataset had an empty column at the end I didn't see. Once I removed it it works. – nchuang Feb 13 '19 at 22:00

score 0 · Answer 2 · answered Feb 12 '19 at 22:23

0

This is probably not a very efficient solution, but if your DataFrame is small enough, this should work:

df[df.std(axis=1)==0]

(I'm assuming all values are numerical)

answered Feb 12 '19 at 22:23

Tacratis

1,035
1
6
16

it's actually letters sorry – nchuang Feb 13 '19 at 15:04

How can I keep all the rows in my dataframe that have the same values in all columns?

2 Answers2