We can use np.unique
over axis 1. Unfortunately, there's no pandas built-in function to drop duplicate columns.
df.drop_duplicates
only removes duplicate rows.
Return DataFrame with duplicate rows removed.
We can create a function around np.unique
to drop duplicate columns.
def drop_duplicate_cols(df):
uniq, idxs = np.unique(df, return_index=True, axis=1)
return pd.DataFrame(uniq, index=df.index, columns=df.columns[idxs])
drop_duplicate_cols(X)
X1 Y1
0 0.0 6.0
1 3.0 7.1
2 7.6 1.2
Online Demo
NB: np.unique
docs:
Returns the sorted unique elements of an array.
Workaround: To retain the original order, sort the idxs
.
Using .T
on dataframe having multiple dtypes
is going to mess with your actual dtypes
.
df = pd.DataFrame({'A': [0, 1], 'B': ['a', 'b'], 'C': [0, 1], 'D':[2.1, 3.1]})
df.dtypes
A int64
B object
C int64
D float64
dtype: object
df.T.T.dtypes
A object
B object
C object
D object
dtype: object
# To get back original `dtypes` we can use `.astype`
df.T.T.astype(df.dtypes).dtypes
A int64
B object
C int64
D float64
dtype: object