I have a Pandas DataFrame, df
, which has a path
column which has paths to image files for analysis. Some of the images in this dataset do not actually exist, so I need to selectively remove rows with a nonexistent image path
.
Currently, I am looping through the entire dataframe and reassigning it like so:
for index, sample in df.iterrows():
if not os.path.isfile(sample['path']):
df = df.drop(index)
However, as my dataset contains tens of thousands of images, this is extremely slow.
I've also looked at using an approach like in this more general question here:
df = df.drop(df[not os.path.isfile(df['path'])].index)
However, this does not work as os.path.isfile
is incompatible with Pandas DataFrames.
I feel like there must be a better way to approach this problem. Any ideas?