0

Here is the current code:

def load_data():
    files = glob.glob('../manga-resized/sliced_images/*.png')
    L = []
    target_dist = []
    i = 0
    for fl in files:
        image = color.rgb2lab(io.imread(fl))
        L.append(image[:,:,:1])
        ab = np.vstack(image[:,:,1:])
        #print 'ab shape: ',ab.shape
        #print 'KNN prediction shape: ',KNN.predict_proba(ab).shape
        target_dist.append(KNN.predict_proba(ab))
        i+=1
        print i
    print "finished creating L and target_dist"
    X = np.asarray(L)
    y = np.asarray(target_dist)
    #  remember to .transpose these later to 0,3,1,2
    print 'X shape: ',X.shape,'y shape: ',y.shape
    return X,y

currently I get the Killed: 9 message after i=391. My computer has 16GB of RAM, but I think I am somehow doing this really inefficiently. Eventually I hope to do this with near 1 million files let alone 400. I feel like this should be possible because I know people train with much larger than 400 file datasets. So how am I screwing this up? Is there some memory leak? I thought those couldn't happen in python. Any other reason for the Killed: 9 error?

thanks

edit: here is the result of ulimit -a

Alexs-MBP-6:manga-learn alex$ ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
file size               (blocks, -f) unlimited
max locked memory       (kbytes, -l) unlimited
max memory size         (kbytes, -m) unlimited
open files                      (-n) 256
pipe size            (512 bytes, -p) 1
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 709
virtual memory          (kbytes, -v) unlimited

here is the output with memory usage printed - after file 221.

https://bpaste.net/show/26109a193e43 . Clearly the available memory is decreasing but its still there by the time it gets the Killed : 9

Edit 2: I have seen in other places that np.asarray is very inefficient. Addiontally, when I take this part out of the formula, it does just fine and does not get killed. I have seen alternatives such as np.fromiter but those only cover 1D arrays - not the two 4 dimensional arrays that need to be returned here, X and y. Does anyone know the correct numpy way to fill these array?s

BigBoy1337
  • 4,735
  • 16
  • 70
  • 138
  • If you're running out of memory, you'll get something like `abort trap: 8`, not `killed: 9`, this is weird... – ForceBru May 22 '16 at 17:24
  • Do you really need to load *all* of the images into RAM at the same time? Normally you'd process one at a time and then unload it. – o11c May 22 '16 at 18:58
  • I thought that was what I was doing? with for fl in files: , image = color.rgb2lab(io.imread(fl)) it only loads a file at a time for image. The arrays that are growing are L and target_dist, but Im not sure theres a way around that? – BigBoy1337 May 22 '16 at 19:01

1 Answers1

0

Signal 9 is an external kill signal.

You can get this ff you are running in a resource restricted environment, e.g., ulimit -t 1 which limits processing time to 1 second. Trying this on my mac, it politely reports "Cputime limit exceed"; I don't know what it will report in linux.

Try ulimit -a from the command line in the environment where the program is being run to see if there is anything there.

Neapolitan
  • 2,101
  • 9
  • 21
  • I added the output to the question. What am I looking for here? Is there a way to unlimit processing time? – BigBoy1337 May 22 '16 at 18:22
  • There is, but you have "cpu time" unlimited, so that is not the explanation. You could try printing memory usage each loop iteration: http://stackoverflow.com/questions/276052/how-to-get-current-cpu-and-ram-usage-in-python – Neapolitan May 22 '16 at 18:39
  • It doesn't appear that the memory usage truly runs out: https://bpaste.net/show/26109a193e43 . There still is available memory, of course it is shrinking and the amount used constantly increases, but i think thats to be expected as its building an array right? – BigBoy1337 May 22 '16 at 18:50
  • The comments in this answer suggest looking in /va/log/kern.log; other log files may contain useful info. http://stackoverflow.com/a/726762/6195051 – Neapolitan May 22 '16 at 19:00
  • hmm I don't have those log files. I am running this on os x though so I'm not sure that advice applies. I do have a kernel_task running when i check top -o MEM using 917MB. Perhaps that kernel_task is the one killing my process? Im not sure if that thing is always running? What if I just killed that? – BigBoy1337 May 22 '16 at 19:10
  • Doesn't sound like kernel_task: http://apple.stackexchange.com/questions/37366/what-exactly-does-kernel-task-do – Neapolitan May 22 '16 at 19:41