Drop columns that aren't similar to another column in the df

Question

Hi I want to compare the data between between John and Kelly. Sometimes for a John column (here "world_john"), Kelly does not have an affilated "world_kelly" column and vice versa. world_john need to be removed since there is no comparison to make. Is this possible in general code?

df
 ``   world_john    fruit_list_john   fruit_list_kelly  output_john output_kelly
0   The start          hungry             banana         high         high
1    world             pear              apple            high         high
2   yesterday          fruit              pear            high         high
...

Expected Output:

  fruit_list_john   fruit_list_kelly   output_john    output_kelly
0       hungry           banana        high           high
1       pear              apple        high            high
2       fruit             pear         high            high

@ScootCork exactly, that’s why world_john must be deleted. I updated the question for clarity — asd, Feb 16 '21 at 13:39
So you just want to drop the 'world_john' column? if so see https://stackoverflow.com/questions/13411544/delete-column-from-pandas-dataframe — ScootCork, Feb 16 '21 at 14:06

ScootCork · Accepted Answer · 2021-02-17T11:02:54.837

If I understand correctly you want to drop any 'john' columns for which there is no 'kelly' column and vice versa. One way of doing this is by looping over the columns and checking if there is a counterpart. If there isn't we drop the column.

import pandas as pd

df = pd.DataFrame({'world_john':[1], 'fruit_list_john':[1], 'fruit_list_kelly':[1], 'output_john':[1], 'output_kelly':[1]})

for col in df.columns:
    if col.endswith('_john') and col.replace('_john', '_kelly') not in df.columns:
        df = df.drop(columns=[col])
    if col.endswith('_kelly') and col.replace('_kelly', '_john') not in df.columns:
        df = df.drop(columns=[col])

Drop columns that aren't similar to another column in the df

1 Answers1