3

I am trying to build a dataset similar to mnist.pkl.gz provided in theano logistic_sgd.py implementation. Following is my code snippet.

import numpy as np
import csv
from PIL import Image
import gzip, cPickle
import theano
from theano import tensor as T

def load_dir_data(csv_file=""):
    print(" reading: %s" %csv_file)
    dataset=[]
    labels=[]

    cr=csv.reader(open(csv_file,"rb"))
    for row in cr:
        print row[0], row[1]
        try: 
            image=Image.open(row[0]+'.jpg').convert('LA') 
            pixels=[f[0] for f in list(image.getdata())]
            dataset.append(pixels)
            labels.append(row[1])
            del image 
        except: 
            print("image not found")
    ret_val=np.array(dataset,dtype=theano.config.floatX)
    return ret_val,np.array(labels).astype(float)   


def generate_pkl_file(csv_file=""):
    Data, y =load_dir_data(csv_file)
    train_set_x = Data[:1500]
    val_set_x = Data[1501:1750]
    test_set_x = Data[1751:1900]
    train_set_y = y[:1500]
    val_set_y = y[1501:1750]
    test_set_y = y[1751:1900]
    # Divided dataset into 3 parts. I had 2000 images.

    train_set = train_set_x, train_set_y
    val_set = val_set_x, val_set_y
    test_set = test_set_x, val_set_y

    dataset = [train_set, val_set, test_set]

    f = gzip.open('file.pkl.gz','wb')
    cPickle.dump(dataset, f, protocol=2)
    f.close()    


if __name__=='__main__':
    generate_pkl_file("trainLabels.csv") 

Error Message: Traceback (most recent call last):

  File "convert_dataset_pkl_file.py", line 50, in <module>
    generate_pkl_file("trainLabels.csv") 
  File "convert_dataset_pkl_file.py", line 29, in generate_pkl_file
    Data, y =load_dir_data(csv_file)
  File "convert_dataset_pkl_file.py", line 24, in load_dir_data
    ret_val=np.array(dataset,dtype=theano.config.floatX)
ValueError: setting an array element with a sequence.

csv file contains two fields.. image name, classification label when is run this in python interpreter, it seems to be working for me.. as follows.. I dont get error saying setting an array element with a sequence here..

---------python interpreter output----------

image=Image.open('sample.jpg').convert('LA')
pixels=[f[0] for f in list(image.getdata())]
dataset=[]
dataset.append(pixels)
dataset.append(pixels)
dataset.append(pixels)
dataset.append(pixels)
dataset.append(pixels)
b=numpy.array(dataset,dtype=theano.config.floatX)
b
array([[ 2.,  0.,  0., ...,  0.,  0.,  0.],
       [ 2.,  0.,  0., ...,  0.,  0.,  0.],
       [ 2.,  0.,  0., ...,  0.,  0.,  0.],
       [ 2.,  0.,  0., ...,  0.,  0.,  0.],
       [ 2.,  0.,  0., ...,  0.,  0.,  0.]])

Even though i am running same set of instruction (logically), when i run sample.py, i get valueError: setting an array element with a sequence.. I trying to understand this behavior.. any help would be great..

ssh99
  • 309
  • 1
  • 3
  • 16
  • 1
    Please always include the full error traceback in your question. – cel May 29 '15 at 05:36
  • Don't just tell us the error. Show us where it occurred. – hpaulj May 29 '15 at 05:48
  • made edits..I tried with gdb. But there was no stack – ssh99 May 29 '15 at 05:49
  • What does `dataset` look like - in both cases. We don't need all the values, but enough to see if there is a difference. – hpaulj May 29 '15 at 06:59
  • image,level sample,2 10_left,0 10_right,0 13_left,0 13_right,0 15_left,1 15_right,2 16_left,4 16_right,4 It is csv file with only two entries per line.. If a load images individually in the interpreter and append pixels, i can perform np.array(dataset,dtype=theano.config.floatX).. But no when i run it in file.. – ssh99 May 29 '15 at 07:02
  • This is how dataset looks like.. dataset[0] for example [2, 0, 0, 4, 0, 0, 7, 0, .... ,94, 85, 53, 31, 14, 1, 0, 6, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 27, 62, 70, 57, 38, 20, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0] total of 784 entries in each row – ssh99 May 29 '15 at 07:27

1 Answers1

6

The problem is probably similar to that of this question.

You're trying to create a matrix of pixel values with a row per image. But each image has a different size so the number of pixels in each row is different.

You can't create a "jagged" float typed array in numpy -- every row must be of the same length.

You'll need to pad each row to the length of the largest image.

Community
  • 1
  • 1
Daniel Renshaw
  • 33,729
  • 8
  • 75
  • 94
  • P.S. if you had just searched for 'numpy "setting an array element with a sequence"' the very first result (I see) in Google is the StackOverflow question I linked to. – Daniel Renshaw May 29 '15 at 08:22
  • No.. All images are of same size.. Its working for me in python interpreter.. Doesnt when i run it as file.. – ssh99 May 29 '15 at 08:53
  • Your Python interpreter example shows the same image being added many times, not many different images. – Daniel Renshaw May 29 '15 at 08:57
  • sorry my bad.. I have two different image sizes.. I will change and will check it.. Thanks – ssh99 May 29 '15 at 09:03