Checking If Identical Column Names Have Identical Values in Pandas

Question

So I have a DataFrame that has multiple column names that are the same. I firstly separate these columns and create a DataFrame that has these duplicated column names. What I was wondering, is there anyway to double check if all of the values in the rows have the same value as each other? In which case, I know that one column is an exact duplicate of another and I can drop it?

Below is what I have gotten so far and an example DF:

import collections
duplicated_columns = [item for item, count in collections.Counter(checking_empty_list).items() if count > 1]

dupe[duplicated_columns].head()

That is a simple way to obtain the columns that appear twice in a dataframe and then I am just displaying the head. I get the following df below:

In this case, ID would be all true and we can delete one of the columns, however total wouldn't be true and we won't delete any as the values are different. I thought of doing but it is unable to differentiate. How would I do it for multiple duplicated columns which could differ each time?

if df['ID'] == df['ID'].all() == True:
    df.drop(['ID'])
else:
    pass

Here is the code to get it into a df in Python.

ID,ID,TOTAL,TOTAL
0.0,0.0,290,0.00
0.0,0.0,16,60.16
0.0,0.0,0,0.00
0.0,0.0,87,0.87
0.0,0.0,78,0.78
df = pd.read_clipboard(sep=',')
df = df.rename(columns = {'TOTAL.1':'TOTAL', 'ID.1':'ID'})

Any help would be appreciated.

Does this answer your question? [python pandas remove duplicate columns](https://stackoverflow.com/questions/14984119/python-pandas-remove-duplicate-columns) — K450, Nov 22 '21 at 13:28
The transpose method seems to be good but some of my dataframes are around 300,000 rows in length which means that it would hang and not work. I sdaw dropping the duplicated column names, however in my case it would mean that a column which doesn't have identical column values would be dropped. — Nhyi, Nov 22 '21 at 13:34
what's wrong with [the second solution in this answer](https://stackoverflow.com/a/32961145/3944322)? (in modern panda you need to import `array_equivalent` from `pandas.core.dtypes.missing`) — Stef, Nov 22 '21 at 16:17
Tried using it but had no luck and the dataframe was returning a dataframe with no values. However, with the df.T.drop_duplicates().T worked fine still. The second solution only returns a list of the duplicated columns and not the actual dataframe. — Nhyi, Nov 23 '21 at 10:13

score 1 · Accepted Answer · answered Nov 24 '21 at 02:40

Ok, as the transpose alternative didn't work. I've tried your code.

to use all() you should delimit the expression with (); the drop method is not in place and the axis should be equals 1 (column).

if (df.loc[:,'ID'] == df.loc[:,"ID.1"]).all():
    df.drop('ID.1', axis= 1,inplace=True)
else:
    pass

df
Out[42]: 
   ID  TOTAL  TOTAL.1
0   0    290     0.00
1   0     16    60.16
2   0      0     0.00
3   0     87     0.87
4   0     78     0.78

score 0 · Answer 2 · answered Nov 22 '21 at 16:53

You can transpose the DataFrame and then use drop_duplicates.

df.T.drop_duplicates()
Out[6]: 
           0      1    2      3      4
ID       0.0   0.00  0.0   0.00   0.00
TOTAL  290.0  16.00  0.0  87.00  78.00
TOTAL    0.0  60.16  0.0   0.87   0.78

To get the desired table, just transpose it back:

new_df = df.T.drop_duplicates() 
new_df.T
Out[8]: 
    ID  TOTAL  TOTAL
0  0.0  290.0   0.00
1  0.0   16.0  60.16
2  0.0    0.0   0.00
3  0.0   87.0   0.87
4  0.0   78.0   0.78

This is definitely the way I want to go, however the issue is that with large dataframes, this calculation takers too long and is inefficient. — Nhyi, Nov 23 '21 at 10:14

Checking If Identical Column Names Have Identical Values in Pandas

2 Answers2