Approach 1 - Uses Transpose
It then finds the duplicate columns using the duplicated() method, keeping only the first occurrence. Next, it obtains the unique column names and transposes the DataFrame back to its original form, keeping only the unique columns. Finally, it assigns the resulting DataFrame to df_output.
# Transpose the DataFrame to make columns as rows
transposed_df = df.transpose()
# Find duplicate columns (excluding the first occurrence)
duplicate_columns = transposed_df.duplicated(keep='first')
# Get the unique column names
unique_columns = transposed_df[~duplicate_columns].index
# Transpose the DataFrame back and keep only the unique columns
df_output = df[unique_columns].copy()
# Print the resulting DataFrame
print(df_output)
If the IDs contain duplicates?
In this updated version, the index is reset at the beginning using df.reset_index(inplace=True) to convert the ID column into a regular column. After removing the duplicate columns, the ID column is set as the index again using df_output.set_index('Id', inplace=True).
By resetting and reassigning the index, you ensure that duplicate IDs are preserved in the resulting DataFrame.
# Reset the index to convert the Id column to a regular column
df.reset_index(inplace=True)
# Transpose the DataFrame to make columns as rows
transposed_df = df.transpose()
# Find duplicate columns (excluding the first occurrence)
duplicate_columns = transposed_df.duplicated(keep='first')
# Get the unique column names
unique_columns = transposed_df[~duplicate_columns].index
# Transpose the DataFrame back and keep only the unique columns
df_output = df[unique_columns].copy()
# Set the Id column as the index again
df_output.set_index('Id', inplace=True)
print(df_output)
Approach 2 - utilizes the nunique() method to identify columns with only one unique value
# Get the counts of unique values per column
value_counts = df.apply(lambda x: x.nunique())
# Filter columns with only one unique value
unique_columns = value_counts[value_counts > 1].index
# Keep only the unique columns
df_output = df[unique_columns].copy()
# Print the resulting DataFrame
print(df_output)
If the Ids are duplicated?
after keeping only the unique columns, we identify the duplicate IDs using df_output.index.duplicated(). Then, we reset the index to convert the ID column into a regular column and remove the rows with duplicate IDs using df_output[~df_output['Id'].duplicated()]. Finally, the ID column is set as the index again using df_output.set_index('Id', inplace=True).
This way, you can handle duplicate IDs while removing duplicate columns based on the uniqueness of the values.
# Get the counts of unique values per column
value_counts = df.apply(lambda x: x.nunique())
# Filter columns with only one unique value
unique_columns = value_counts[value_counts > 1].index
# Keep only the unique columns
df_output = df[unique_columns].copy()
# Identify duplicate IDs
duplicate_ids = df_output.index[df_output.index.duplicated()]
# Reset index for duplicate IDs
df_output.reset_index(inplace=True)
# Remove duplicate IDs from the DataFrame
df_output = df_output[~df_output['Id'].duplicated()]
# Set the ID column as the index again
df_output.set_index('Id', inplace=True)
print(df_output)