I have a dataframe which contains 40108 rows and a folder with pictures (only using a sample of the total 40108 pictures) containing 997 files. The file names of the images correspond to the rows in the column 'imdbId' in the df, with the addition that they have the .jpg suffix.
I would like to drop all rows in my df where the names in the imdbId column doesnt have any corresponding file name in my folder and keep the rest. Meaning there should be 997 rows left after having run the code.
Example:
Position 1 in the df is 114709. A picture with name 114709.jpg doesnt exist in the folder, meaning this row should be dropped.
Position 2 in the df is 113497. A picture with name 113497.jpg exists in the folder. This row should remain. ... and so on for all rows.
I have been trying to create an index with booleans and a for/if loop with os.path.isfile, but I cant manage to insert the imdbId from the df into any conditions correctly.
example from my notebook:
exists = os.path.isfile('moviegenre/SampleMoviePosters/**114709.jpg**')
if exists:
# Do nothing, let the row remain.
else:
# Drop row
Some help would be greatly appreciated. Thanks in advance.