In python, I have a list of 68K files named files_in_folder
. Additionally, I have a csv file (pd dataframe) with filenames and extensions. An example:
import pandas as pd
import os
files_in_folder = ['2.fds', '4.fds', '5.jpg']
df = pd.DataFrame({'filename': ['1.fds', '2.fds', '3.fds', '4.fds', '5.jpg'],
'correct_extension?': [None, None, None, None, None],
'extension': ['.fds', None, '.json', '.fds', '.jpg']
})
For every item in the list I check if the file is in column 'filename'. If the correct extension is in column 'extension' True
should be added in column 'correct_extension?' at that row.
On stack I found numpy's 'where', that could do something like this:
for file in files_in_folder:
extension = os.path.splitext(file)
df['correct_extension?'] = np.where( ( (df['filename'] == file) & (df['extension'] == extension ) ) , True, False)
However, because of my loop, this method doesn't give the expected results (below). I am looking for someone that can give me a hint on how to solve this problem, preferably with a loop.
I'm very eager to learn from you.
expected result: dataframe ->
'filename': ['1.fds', '2.fds', '3.fds', '4.fds', '5.jpg'],
'correct_extension?': [None, None, None, True, True],
'extension': ['.fds', None, '.json', '.fds', '.jpeg']
A similar topic I found was: Pandas: How do I assign values based on multiple conditions for existing columns?