0

I'm starting a ML project, and before I actually make a model I wanted to try to import my images (I have a file of just images I saved, they are .png if it matters) and play around with them and manipulate them so they will fit a model. All I do is load the files, and then try to show them, but it doesn't show anything. The file path seems to be right because the first time I tried it it was wrong and gave a big error message, but now doesn't seem to do so. How do I get it so when I load the files, I can run something like

data[0]

and see the first image (or details of the image). My code is as follows (I imported a lot of other things from the tensorflow guide above this code, so I don't think that's it, but I can edit in my other imports if that's necessary):

import pathlib
import sklearn.datasets
data_dir = sklearn.datasets.load_files('/Users/USer/Downloads/C4IMAGES/', shuffle='False')
data_dir

and the output of running this is:

{'data': [],
 'filenames': array([], dtype=float64),
 'target_names': [],
 'target': array([], dtype=float64),
 'DESCR': None}

and if I try data_dir[0] which should show the first image, the error message is

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-48-5541d6af8248> in <module>
      2 import sklearn.datasets
      3 data_dir = sklearn.datasets.load_files('/Users/USer/Downloads/C4IMAGES/', shuffle='False')
----> 4 data_dir[0]

KeyError: 0

Thanks for any help!

Tirth Patel
  • 1,855
  • 3
  • 13
  • 22
joelanbanks3
  • 318
  • 3
  • 14
  • The documentation [here](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_files.html) for the function `sklearn.datasets.load_files` might help you. Also make sure that you are in correct current working directory for your path " /Users/USer/Downloads/C4IMAGES/ " to be accessible. – Imanpal Singh Dec 03 '19 at 03:54

1 Answers1

1

Actually, the documentation of sklearn.datasets.load_files says that the images or any data files must be present in the following hierarchy:

container_folder/
category_1_folder/
file_1.txt file_2.txt … file_42.txt
category_2_folder/
file_43.txt file_44.txt …

I think your images are present at the path /Users/USer/Downloads/C4IMAGES/. In that case, you will have to create a subfolder like category 1, category 2 (if your data is not categorized, just create a subfolder with any name and put all your images in the subfolder) and put images with coresponding category in the subfolders.

Now, you can pass the argument /Users/USer/Downloads/C4IMAGES/ in the function load_files and it should load your data in python list data_dir['data'] in binary format.

You can then convert your images from binary format to numpy array and display your image:

import io
import numpy as np    
from PIL import Image

# decode i'th image using: 
img = Image.open(io.BytesIO(data_dir.data[i]))
img = np.asarray(img)

# display i'th image
import matplotlib.pyplot as plt

plt.imshow(img)
plt.show()

Referances:
1. https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_files.html
2. Convert image loaded as binary string into numpy array

Tirth Patel
  • 1,855
  • 3
  • 13
  • 22
  • Ok, this works great! One last question: if I run your code with a for loop from 0 to len(data_dir) it only does 6 images, what value would be needed to do it for every value in data_dir? – joelanbanks3 Dec 03 '19 at 22:42
  • Careful there! `data_dir` is a dictionary with key-value pairs. If you want to process each image you will have to iterate over `data_dir['data']`. – Tirth Patel Dec 04 '19 at 05:14