0

I want to remove all rows of the dataframe that don't have the image names in the list in python. here is an example of the output with the following dataframe and list:

Dataframe:

ImageName    |   A    
0001.jpg     |   1
0002.jpg     |   1
0003.jpg     |   1
0004.jpg     |   1

List:
'0003.jpg'
'0001.jpg'

Output dataframe:
ImageName    |   A    
0001.jpg     |   1
0003.jpg     |   1

This is the code I made:

df = pd.read_csv("./imagenames.csv")
for index, row in df.iterrows():        
    a1=df.iloc[index][0]
    a2=train_gen_labels
    result=pd.Series(a1).isin(a2).any()
    if(result==False):
        df=df.drop(index)
    else : positive=positive+1

However, this code returns the error: single positional indexer is out-of-bounds

Bia
  • 305
  • 3
  • 10
  • instead of the for? If that's the case it doesn't work. It gives the error `KeyError: 0` – Bia May 30 '21 at 14:23

1 Answers1

0

You can avoid the for loop totally, and get the work done by below:

df[df.ImageName.isin(train_gen_labels)]

df.ImageName.isin(train_gen_labels) creates a mask to select only the rows where ImageName values are in train_gen_labels list.

Ank
  • 1,704
  • 9
  • 14