0

I have a large number of png files where each filename is a unique ID with a corresponding data in a large pandas Dataframe. I can find the filenames by os.list and then try to find the corresponfin "ind = df['image_id']==name". However, this is a very slow process. Is there a more efficient approach?

import os
files = os.listdir(path)
for file in files:
    name = file.split(".")[0]
    index = df['image_id']==name
    print(df.loc[index].values[0][1])
Roy
  • 65
  • 2
  • 15
  • 40
  • Your [mre] should include a minimal example of the DataFrame. [How to make good reproducible pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) – wwii May 16 '21 at 03:26

1 Answers1

1

Maybe make the filename list into a set then use the isin method to get all the indices at once. It is a little hard as you didn't give us an example DataFrame to work with.

import os
files = os.listdir(path)
names = set((path.split('.')[0] for path in files))
mask = df['image_id'].isin(names)
wwii
  • 23,232
  • 7
  • 37
  • 77