Suppose I have a DataFrame pd with a column called 'elements' which contains a list of a list of objects as shown below:
print(df2['elements'])
0 [Element B, Element Cr, Element Re]
1 [Element B, Element Rh, Element Sc]
2 [Element B, Element Mo, Element Y]
3 [Element Al, Element B, Element Lu]
4 [Element B, Element Dy, Element Os]
I would like to search through the column and if, for example, Element Mo is in that row delete the whole row to look like this:
print(df2['elements'])
0 [Element B, Element Cr, Element Re]
1 [Element B, Element Rh, Element Sc]
2 [Element Al, Element B, Element Lu]
3 [Element B, Element Dy, Element Os]
I'm currently trying to do it with a for loop and if statements like this:
for entry in df2['elements']:
if 'Element Mo' in entry:
df2.drop(index=[entry],axis=0, inplace=True)
else:
continue
But it is not working and giving me a KeyError: [] not found in axis.
Update:
I just realized that the if and in statement route I showed does not search for exact string matches, but also strings that contain target string, so for example with the updated df below:
print(df2['elements'])
0 [Element B, Element Cr, Element Re]
1 [Element B, Element Rh, Element Sc]
2 [Element B, Element Mo, Element Y]
3 [Element Al, Element B, Element Lu]
4 [Element Mop, Element B, Element Lu]
5 [Element B, Element Dy, Element Os]
If I run a for loop with if/in statements like this:
for ind in df2.index.values:
entry = df2.loc[ind, 'elements']
if 'Element Mo' in entry:
df2.drop(index=ind ,axis=0, inplace=True)
Both row 2 and 5 will be dropped from the df because the string 'Element Mop' contains the string 'Element Mo', but I don't want this to happen. I tried updating the code above with regex like the one below, but it doesn't work.
for ind in df2.index.values:
entry = df2.loc[ind, 'elements']
if '\bElement Mo\b' in entry:
df2.drop(index=ind ,axis=0, inplace=True)
Edit #2: Here is the dictionary of the first 25 items of the column:
df2_dict = df2['elements'].head(25).to_dict()
{0: '[Element B, Element Cr, Element Re]', 1: '[Element B, Element Rh, Element Sc]', 2: '[Element B, Element Mo, Element Y]', 3: '[Element Al, Element B, Element Lu]', 4: '[Element B, Element Dy, Element Os]', 5: '[Element B, Element Fe, Element Sc]', 6: '[Element B, Element Cr, Element W]', 7: '[Element B, Element Ni]', 9: '[Element B, Element Pr, Element Re]', 10: '[Element B, Element Cr, Element V]', 11: '[Element B, Element Co, Element Si]', 12: '[Element B, Element Co, Element Yb]', 13: '[Element B, Element Lu, Element Yb]', 14: '[Element B, Element Ru, Element Yb]', 15: '[Element B, Element Mn, Element Pd]', 16: '[Element B, Element Co, Element Tm]', 17: '[Element B, Element Fe, Element W]', 19: '[Element B, Element Ru, Element Y]', 20: '[Element B, Element Ga, Element Ta]', 21: '[Element B, Element Ho, Element Re]', 22: '[Element B, Element Si]', 23: '[Element B, Element Ni, Element Te]', 24: '[Element B, Element Nd, Element S]', 25: '[Element B, Element Ga, Element Rh, Element Sc]', 26: '[Element B, Element Co, Element La]'}
The actual issue here is that if I try to drop rows that contain the string 'Element S' (in row 25) all entries with elements like 'Element Sc' or 'Element Si' are also removed.