So I have a DataFrame that has multiple column names that are the same. I firstly separate these columns and create a DataFrame that has these duplicated column names. What I was wondering, is there anyway to double check if all of the values in the rows have the same value as each other? In which case, I know that one column is an exact duplicate of another and I can drop it?
Below is what I have gotten so far and an example DF:
import collections
duplicated_columns = [item for item, count in collections.Counter(checking_empty_list).items() if count > 1]
dupe[duplicated_columns].head()
That is a simple way to obtain the columns that appear twice in a dataframe and then I am just displaying the head. I get the following df below:
In this case, ID would be all true and we can delete one of the columns, however total wouldn't be true and we won't delete any as the values are different. I thought of doing but it is unable to differentiate. How would I do it for multiple duplicated columns which could differ each time?
if df['ID'] == df['ID'].all() == True:
df.drop(['ID'])
else:
pass
Here is the code to get it into a df in Python.
ID,ID,TOTAL,TOTAL
0.0,0.0,290,0.00
0.0,0.0,16,60.16
0.0,0.0,0,0.00
0.0,0.0,87,0.87
0.0,0.0,78,0.78
df = pd.read_clipboard(sep=',')
df = df.rename(columns = {'TOTAL.1':'TOTAL', 'ID.1':'ID'})
Any help would be appreciated.