So, i have this Dataframe with almost 3 thousand rows, that looks something like this:
CITIES
0 ['A','B']
1 ['A','B','C','D']
2 ['A','B','C']
4 ['X']
5 ['X','Y','Z']
... ...
2670 ['Y','Z']
I would like to remove from the DF all rows were the 'CITIES' list is contained in another row (the order does not matter), on the example above, i would like to remove 0 and 2, since both are contained in 1, and also remove 4 and 2670, since both are contained, i tried something, it kinda worked, but it was really stupid and took almost 10 minutes to compute, this was it:
indexesToRemove=[]
for index, row in entrada.iterrows():
citiesListFixed=row['CITIES']
for index2, row2 in entrada.iloc[index+1:].iterrows():
citiesListCurrent=row2['CITIES']
if set(citiesListFixed) <= set(citiesListCurrent):
indexesToRemove.append(index)
break
Is there a more efficient way to do this?