How do I import images with filenames corresponding to column values in a dataframe?

Question

I'm a doctor trying to learn some code for work, and was hoping you could help me solve a problem I have with regards to importing multiple images into python.

I am working in Jupyter Notebook, where I have created a dataframe (named df_1) using pandas. In this dataframe each row represents a patient, and the first column shows the case number for each patient (e.g. 85).

Now, what I want to do is import multiple images (.bmp) from a given folder(same location as the .ipynb file). There are many images in this folder, and I do not want all of them - only the ones who have filenames corresponding to the "case_number" column in my dataframe (e.g. 85.bmp).

I already read this post, but I must admit it was way to complicated for me to understand.

Is there some simple loop (or something else) I could create to import all images with filenames corresponding to the values of the "case number" column in the dataframe?

I was imagining something like the below would be possible, I just do not know how to write it.

for i=[(df_1['case_number'()]
    cv2.imread('[i].bmp')

The images don't really need to be implemented in the dataframe, but I would like to be able to view them in my notebook by using e.g. plt.imshow(85) afterwards.

Here is an image of the head of my dataframe

Thank you for helping!

AS11 · Accepted Answer · 2021-06-24T20:33:40.803

1

You can access all of your files using this:

imageList = []

for i in range(0, len(df_1)):
    cv2.imread('./' + str(df_1['case_number'][i]) + '.bmp')
    imageList.append('./' + str(df_1['case_number'][i]) + '.bmp')

plt.imshow(imagelist[x])

This is looping through every item in the case_number column, the ./ shows that your file is within the current directory, using the directory path leading up to your current file. And by making everything a string and joining it you make it so that the file path is readable. The path created by joining the strings should look something like ./85.bmp, which should open your desired file. Also, you are appending the filenames to the list so that they can be accessed by the plt.imshow()

If you would like to access the files based on their name, you can use another variable (which could be set as an input) and implement the code below

fileName = input('Enter Your Value: ')     
inputFile = imageList.index('./' + fileName + '.bmp')

and from here, you could use the same plt.imshow(imagelist[x]), but replace the x with the inputFile variable.

edited Jun 24 '21 at 20:33

answered Jun 23 '21 at 22:56

AS11

1,311
1
7
17

First of all, thank you for the swift answer. I tried your code, but I get an error message (KeyError: 5). However, I found that it works on the original dataframe df_0 (imported from .csv), from which I created df_1 by using the following code: ` df_1=df_0.loc[(df_0['Recurrence'] == 'yes')] ` Is there some reason this new df_1 would have different properties than the original df_0 read from the .csv? Also, I have trouble viewing the images by plt.imshow() after they have been read into my notebook (on df_0, where the images are successfully read). What should be within the () in imshow? – Apollon Jun 24 '21 at 10:12
This should help for plotting the image: https://stackoverflow.com/questions/35286540/display-an-image-with-python As for the dataframe, you can check if they have the same properties by printing both `df's` and comparing them. – AS11 Jun 24 '21 at 14:34
Yes, the two dataframes look exactly the same, except some rows have been removed in df_1 due to my filtering by recurrence status (as was the intention). I have no problem displaying other imported images using `plt.imshow()` , but I just don't know what the references/names of the imported images are. I tried putting in one static reference myself; `case = cv2.imread('./' + str(df_0['case_number'][i]) + '.bmp'` , and when I then use `plt.imshow(case)` I do get the image for the last case displayed. However I want this reference to be dynamic, so I can show the image for e.g. case 15. – Apollon Jun 24 '21 at 17:31
I don't completely understand your current problem, but from what I understood, would appending the filenames to a list, and then using the names from the list to show the images be what you are looking for? – AS11 Jun 24 '21 at 18:10
The problem with reading the df_1 dataframe disappeared when I first exported to .xlsx and then re-imported it using `df_1 = pd.read_excel('df_1.xlsx')`. Don't know why. But yes, I believe it would work if the filenames were appended to a list as you say. Let me try to explain using an image: [Image](https://imgur.com/EgCpliE) – Apollon Jun 24 '21 at 19:11
Yes, I believe the best solution is creating an empty list, and then using the for loop that you already created to append the names to the list. From here, you can show the images using the index of the filename that you use. The index can be typed manually, randomized, or calculated in a particular way that you want. You could set this to a variable `listname.index("value")` to find the index of a specific element (which you could determine using an input) and then pass this index/variable to the `plt.imshow(list[index])`, and this should give the expected output – AS11 Jun 24 '21 at 19:25
If you would like, I could update my answer to add this information – AS11 Jun 24 '21 at 19:26
**Beautiful, this was it!** With e.g. `plt.imshow(imagelist[5])` I now get the image corresponding to the 6th case (as cases start at 0). This works for all cases in the dataframe. I'm sure there's some way I could get the index to reflect the _case_numbers_ rather than the _row_, but that's not so important. I'll add another [image](https://imgur.com/T7ggKZ0) in case others are looking for the same solution with the complete code. And yeah, probably a good idea to add this last part to your answer. Thanks again! – Apollon Jun 24 '21 at 20:12
I added how the new information, along with a possible solution to indexing based on the case number. If you fount this helpful, please upvote and accept it as an answer – AS11 Jun 24 '21 at 20:35

How do I import images with filenames corresponding to column values in a dataframe?

1 Answers1