1

I got a folder with 1976 training images. Each image has a shape (118,128,1) (greyscaled images). I created an array with all the images like this:

import glob
import scipy
import cv2

images = [cv2.imread(path, 0) for path in glob.glob('rootDir/train/*.png')]
images = np.asarray(images)

which yields:

images 
out[0]  array([[[ 38,  47,  51, ...,  53,  53,  46],
            [ 48,  49,  50, ...,  53,  50,  51],
            [ 48,  51,  53, ...,  54,  50,  51],
            ...,
            [ 59,  61,  57, ..., 194, 195, 200],
            [ 76,  71,  65, ..., 212, 212, 199],
            [ 81,  80,  77, ..., 179, 184, 197]],
            ....

images.shape
out[1]: (1976, 128, 118)

now the thing is, i have the images's labels stored in a csv file in the following format:

id,appliance
1000,8
1001,1
1002,8
1003,1
1004,6
1005,1
1006,1
1007,2
1008
1009,5
1010
1011,3
1012,2
....

the id matches the filename of each image and the "appliance" column contains the label values assigned to each image for training.

In order to feed this data to a CNN model using CNTK I need to convert the image data into a one-hot encoded array with the image features and its labels. The expected output I would like to have would be something like this:

|labels 0 0 0 1 0 0 0 0 0 0 |features 0 0 0 0 ... 
                                              (15104 integers each representing a pixel)

I'm totally lost and appreciate any help on this.

EDIT IN RESPONSE TO DAN-MASEK'S COMMENT:

Hi Dan, here's the screenshot of the error:

Screenshot

as i said before, i set the variable ID_APP_MAP_FILENAME = 'train_labels.csv' like this. Tell me if you need any further information. Thank you

Miguel 2488
  • 1,410
  • 1
  • 20
  • 41
  • 1
    First step, load the csv into a dictionary mapping ID to appliance. Then iterate over the paths returned by glob. Parse each path to get the ID from the file name. Look up corresponding appliance in the dictionary, generate the labels array for this image. Load the image and flatten it. Concatenate the two single row arrays. Finally stack them all vertically, to get an array of 1976 rows, one row per input. | I think you should be able to find existing guidance on how to do each of those steps either on SO, or on the net in general. – Dan Mašek Oct 03 '18 at 23:10
  • Or even better, since your CSV already contains all the labels (i.e. file names), you can even avoid the glob and parsing -- just generate filenames based on the IDs in the CSV. – Dan Mašek Oct 03 '18 at 23:36
  • Hi @DanMašek,thank you for your answer. What you are telling seems the right way to go, but without a little more guidance over the steps i don't think i could get this done. Could you please show me a bit about how to follow this steps? Just the key steps are ok, you don't need to write a complete walkthrough, i know it's a lot. If you post it in an answer i would be happy to accept it as correct and upvote it. Thanks in advance – Miguel 2488 Oct 03 '18 at 23:37
  • hi again @DanMašek i can't really do that with the csv, since i have 2 images per observation, so one observation would have e.g.: 1000_c.jpg and 1000_v.jpg. I0m not sure if i can go over it the way you are saying – Miguel 2488 Oct 03 '18 at 23:41
  • 1
    The same label applies to both, and each ID has the `_c` and `_v` variant? Although parsing the id from the filename wouldn't be difficult in this case anyway -- split off the filename, and the split that on the underscore and interpret the first part as integer. – Dan Mašek Oct 03 '18 at 23:43
  • 1
    Have a look at [this script](https://pastebin.com/aBrEFRmM) -- does that do what you want and does it make sense? I can also write the variant that uses glob and parses the filenames to get the ids if you want. – Dan Mašek Oct 04 '18 at 00:02
  • hi again @DanMašek. Really i want to thank you for your help, you rally put a lot of work in there. Although, i'm having an error when i try to run your script. I edited my question and added a screenshot of the error so you can see what's happening. I'm setting the variable ID_APP_MAP_FILENAME = 'train_labels.csv' like this, because i assume you script is meant to convert the csv into a dictionary to map the id values to the images's names. But still, i'm having a `TypeError` of kind `object of type 'map' has no len()`. Please tell me your thinkings about this. And thanks a lot again – Miguel 2488 Oct 04 '18 at 08:04
  • Hmm, are you using Python 3.x? What that statement should do is first split the line at the location of `,` into separate strings, and then apply `int` on each element, so that you get a list of integers. I guess the way it works in 3.x changed, will have a look at it when I get back. – Dan Mašek Oct 04 '18 at 08:55
  • Hello again Dan, i didn't know this was python 2, i'll try it out in python 2 and will see if it works, if it does, i'll just save it as a txt file and continue the modeling phase with python 3, so i don't have to bother you again. I'll let you know as soon as i try it! Thanks again :) – Miguel 2488 Oct 04 '18 at 10:00
  • Hi @DanMašek i tried your script in python 2 and it works perfectly. If you can post this as an answer i will be happy to accept it and close this question. You really helped me with this, thank you very much for the time you've put into this. This community really needs more people like you!! – Miguel 2488 Oct 04 '18 at 20:48
  • 1
    Sound good, thanks :) | BTW, [this question](https://stackoverflow.com/questions/1303347/getting-a-map-to-return-a-list-in-python-3-x) shows what to do to get a list after using `map` in Python3. (I know I'm a bit archaic, but there's so much stable code I've got for 2.7.x with little reason to upgrade just for sake of upgrading ..) | I'll write something up, might take a bit tho. – Dan Mašek Oct 04 '18 at 20:57
  • That's ok @DanMašek. Thank you again for your tips, no worries take your time to write your answer. I'll be here to upvote and accept it :) – Miguel 2488 Oct 05 '18 at 07:34

0 Answers0