0

I'm trying to plot the image's size distribution of a big folder of thousand of pictures (uploaded on my local Jupyter notebook). All images have .PNG extension.

I need to create a pandas's dataframe who must lead to this:

                   **Size**
df = [[filename1,   1200 800],
      [filename2,   1100 850],
      [filename3,   1200 800],
      ....]

I tried a lot of methods and i'm blocked on that last who seems to be a good path to achieve it:

# load all images in a directory
from os import listdir
from matplotlib import image
# load all images in a directory
loaded_images = list()
for filename in listdir('MyImagesFolder/'):
    # load image
    img_data = image.imread('MyImageFolder/' + filename)
    # store loaded image
    loaded_images.append(img_data)
    print('> loaded %s %s' % (filename, img_data.shape))
    

Result:

loaded Anchusa italica buglosse italien 05-05-2009 13-42-33.png (600, 800, 3)
> loaded Anchusa italica buglosse italien 05-05-2009 13-42-55.png (600, 800, 3)
> loaded Anchusa italica buglosse italien 05-05-2009 13-43-09.png (600, 800, 3)
> loaded Anchusa italica buglosse italien 05-05-2009 13-43-13.png (600, 800, 3)
> loaded Anchusa italica buglosse italien 05-05-2009 13-43-19.png (600, 800, 3)
> loaded Anchusa italica buglosse italien 05-05-2009 13-43-49.png (600, 800, 3)
> loaded Anchusa italica buglosse italien 05-05-2009 13-43-55.

Then

import pandas as pd
import matplotlib as plt

image_size_df = pd.DataFrame(data=loaded_images)

But the result is:

print(images_loaded)
[array([[[0.34901962, 0.40392157, 0.25882354],
        [0.34901962, 0.4117647 , 0.25882354],
        [0.34117648, 0.41568628, 0.25882354],
        ...,
        [0.85882354, 0.84313726, 0.8039216 ],
        [0.85882354, 0.84313726, 0.8       ],
        [0.8627451 , 0.84313726, 0.79607844]],

I'm totaly newbie into images data extraction and manipulation and spend more than a day looking dor a solution:/ Thanks for your helps!

Abel
  • 3
  • 2

1 Answers1

0

The issue with what you currently have is that the image.imread method isn't collecting the information you think it is. If you look at the API documentation here: https://matplotlib.org/api/image_api.html you'll see under matplotlib.image.imread that it's reading in the image data for RGB images as an array -- that array isn't information about the image size but instead is the actual data to create the visual image.

It looks like you have the file name piece covered, but you might want to look into the PIL module to get image sizes; see this related SO post: How do I get the picture size with PIL?

From there, you'll want to create lists for file name, image width, and image height that you can combine into a pandas dataframe.

Laura
  • 78
  • 9