I have a dataframe with multiple columns, and I want to extract rows for which the column names are almost matching (basically, the entry in one column should be part of the entry of another column).
Suppose the whole dataframe is df
, and the columns in question are Rubriken-Gruppe
and User Gruppe
. This is how I achieved what I want, but I wonder whether there is a more elegant/faster way to do this:
only_groups = df[['Rubriken-Gruppe', 'User Gruppe']]
same_flag = []
for index, row in only_groups.iterrows():
same_flag.append(True if row['Rubriken-Gruppe'] in row['User Gruppe'].split(' ') else False)
same_groups = df[same_flag]
The dataframe same_groups
contains the desired result.