0

I'm currently teaching myself pandas and python for machine learning. I've done fine with text data thus far, but dealing with image data with limited knowledge of python and pandas is tripping me.

I have read in a .csv file into pandas dataframe, with one of its columns containing url to an image. So this is what shows when I get info from the dataframe.

dataframe = pandas.read_csv("./sample.csv")
dataframe.info()

<class 'pandas.core.frame.DataFrame'>

RangeIndex: 5000 entries, 0 to 4999

Data columns (total of 5 columns):

name 5000 non-null object

...

image 5000 non-null object

the image column contains url to the image. The problem is, I do not know how to import the image data from this and save it as numpy array for processing.

Any help is appreciated. Thanks in advance!

Ishiro Kusabi
  • 211
  • 2
  • 6
  • 13
  • can you post a snippet of the csv – johnashu Sep 15 '17 at 15:09
  • Welcome to SO. Unfortunately this isn't a code writing service. If you haven't had the opportunity, please read [ask] and [mcve]. With a little research and studying the Python documentation you should find tools to help you *grab* an image from the web with a url. If you come up with a solution and get stuck, come back and ask. – wwii Sep 15 '17 at 15:13
  • Which version of Python are you using? Are you using the DataFrame for other purposes or is it just an intermediate step to parse the csv file? – wwii Sep 15 '17 at 15:42
  • 1
    Thanks johnashu and wwii! I've read through the links, and I apologize that my question was vague and information insufficient. I will try my best to better ask questions next time. Thanks for taking the time to read through my question! – Ishiro Kusabi Sep 15 '17 at 15:52

2 Answers2

1

As we don't know your csv-file, you have to tune your pd.read_csv() for your case.

Here i'm using requests to download some image in-memory.

These are then decoded with the help of scipy (which you already should have; if not: you can use Pillow too).

The decoded images are then raw numpy-arrays and shown by matplotlib.

Keep in mind, that we are not using temporary-files here and everything is hold in memory. Read also this (answer by jfs).

For people missing some required libs, one should be able to do the same with (code needs to be changed of course):

I just selected some random images from some german newspage.

Edit: Free images from wikipedia now used!

Code:

import requests                 # downloading images
import pandas as pd             # csv- / data-input
from scipy.misc import imread   # image-decoding -> numpy-array
import matplotlib.pyplot as plt # only for demo / plotting

# Fake data -> pandas DataFrame
urls_df = pd.DataFrame({'urls': ['https://upload.wikimedia.org/wikipedia/commons/thumb/c/cb/Rescue_exercise_RCA_2012.jpg/500px-Rescue_exercise_RCA_2012.jpg',
                                 'https://upload.wikimedia.org/wikipedia/commons/thumb/3/31/Clinotarsus_curtipes-Aralam-2016-10-29-001.jpg/300px-Clinotarsus_curtipes-Aralam-2016-10-29-001.jpg',
                                 'https://upload.wikimedia.org/wikipedia/commons/thumb/9/9f/US_Capitol_east_side.JPG/300px-US_Capitol_east_side.JPG']}) 

# Download & Decode
imgs = []
for i in urls_df.urls:               # iterate over column / pandas Series
    r = requests.get(i, stream=True) # See link for stream=True!
    r.raw.decode_content = True      # Content-Encoding
    imgs.append(imread(r.raw))       # Decoding to numpy-array

# imgs: list of numpy arrays with varying shapes of form (x, y, 3)
#     as we got 3-color channels
# Beware!: downloading png's might result in a shape of (x, y, 4)
#     as some alpha-channel might be available
# For more options: https://docs.scipy.org/doc/scipy/reference/generated/scipy.misc.imread.html

# Plot
f, arr = plt.subplots(len(imgs))
for i in range(len(imgs)):
    arr[i].imshow(imgs[i])
plt.show()

Output:

enter image description here

sascha
  • 32,238
  • 6
  • 68
  • 110
  • Thank you sascha! The decoding was really the part I needed help on. I am sorry the information was insufficient. I guess I'm currently at a stage where I don't know what I don't know so my questions ended up being vague. Thanks again for your help! – Ishiro Kusabi Sep 15 '17 at 15:47
1

If you want to download the images from the web and then, for example, rotate your images from your dataframe, and save the results you can use the following code:

import pandas as pd
import matplotlib.pylab as plt
import numpy as np
from PIL import Image
import urllib2 as urllib
import io

df = pd.DataFrame({
"name": ["Butterfly", "Birds"],
"image": ["https://upload.wikimedia.org/wikipedia/commons/0/0c/Two-tailed_pasha_%28Charaxes_jasius_jasius%29_Greece.jpg",
                                 'https://upload.wikimedia.org/wikipedia/commons/c/c5/Bat_cave_in_El_Maviri_Sinaloa_-_Mexico.jpg']})

def rotate_image(image, theta):
    """
    3D rotation matrix around the X-axis by angle theta
    """
    rotation_matrix = np.c_[
        [1,0,0],
        [0,np.cos(theta),-np.sin(theta)],
        [0,np.sin(theta),np.cos(theta)]
    ]
    return np.einsum("ijk,lk->ijl", image, rotation_matrix)

for i, imageUrl in enumerate(df.image):
    print imageUrl
    fd = urllib.urlopen(imageUrl)
    image_file = io.BytesIO(fd.read())
    im = Image.open(image_file)
    im_rotated = rotate_image(im, np.pi)
    fig = plt.figure()
    plt.imshow(im_rotated)
    plt.axis('off')
    fig.savefig(df.name.ix[i] + ".jpg")

If instead you want to show the pictures you can do:

plt.show()

The resulting pictures are birds and butterfly which can be seen here as well: Butterfly Birds

Cedric Zoppolo
  • 4,271
  • 6
  • 29
  • 59
  • Thank you Cedirc! I used a different method but this one also worked well and seems to be a lot cleaner than my method! Have a good day. Thanks again. – Ishiro Kusabi Sep 15 '17 at 19:07