0

I am getting following error

Process finished with exit code 137 (interrupted by signal 9: SIGKILL)

while trying to execute following code on just 10,000 images. Considering amount of data these days, I don't think the image set it this big, and after I added some memory checks, it does not seem I am running out of memory, something that I would have expected when seeing exit code of 137. Second pair of eyes would be appreciated!

The code

import numpy as np
import joblib
from tqdm import tqdm
from keras.preprocessing import image
from keras.applications import vgg16

# Path to folders with training data
img_path = Path()

images = []
labels = []

# Load all the images
for img in tqdm(os.listdir("training_data")):
   # Load the image from disk
   img = image.load_img(img)

   # Convert the image to a numpy array
   image_array = image.img_to_array(img)

   # Add the image to the list of images
   print("Number of images " + str(len(images)))
   print("Memory size of images list " + str(sys.getsizeof(images)))
   images.append(image_array)

   # the expected value should be 0
   labels.append(0)

Output:

Memory size of images list 77848

4%|▍ | 8919/233673 [06:42<9:06:24, 6.86it/s]

Number of images 8919

Memory size of images list 77848

4%|▍ | 8920/233673 [06:42<11:26:09, 5.46it/s]

Process finished with exit code 137 (interrupted by signal 9: SIGKILL)

Basically, I am trying to extend this example on how to use VGG16 model to extract features from your own images so that you can do classification on them later on by finishing the model with Dense Layer of sigmoid. The example works with 100 images, but now that I have larger dataset of example, it fails short.

from pathlib import Path
import numpy as np
import joblib
from keras.preprocessing import image
from keras.applications import vgg16

# Path to folders with training data
img_path = Path("training_data")

images = []
labels = []

# Load all the images
for img in not_dog_path.glob("*.png"):
    # Load the image from disk
    img = image.load_img(img)

    # Convert the image to a numpy array
    image_array = image.img_to_array(img)

    # Add the image to the list of images
    print("Number of images " + str(len(images)))
    print("Memory size of images list " + str(sys.getsizeof(images)))
    images.append(image_array)

    # the expected value should be 0
    labels.append(0)

# Load all the dog images
for img in dog_path.glob("*.png"):
    # Load the image from disk
    img = image.load_img(img)

    # Convert the image to a numpy array
    image_array = image.img_to_array(img)

    # Add the image to the list of images
    images.append(image_array)

    # For each 'dog' image, the expected value should be 1
    labels.append(1)

# Create a single numpy array with all the images we loaded
x_train = np.array(images)

# Also convert the labels to a numpy array
y_train = np.array(labels)

# Normalize image data to 0-to-1 range
x_train = vgg16.preprocess_input(x_train)

# Load a pre-trained neural network to use as a feature extractor
pretrained_nn = vgg16.VGG16(weights='imagenet', include_top=False, input_shape=(64, 64, 3))

# Extract features for each image (all in one pass)
features_x = pretrained_nn.predict(x_train)

# Save the array of extracted features to a file
joblib.dump(features_x, "x_train.dat")

# Save the matching array of expected values to a file
joblib.dump(y_train, "y_train.dat")
    enter code here

Ram info

free -m
              total        used        free      shared  buff/cache   available
Mem:         386689      162686      209771          39       14231      222703
Swap:         30719        5156       25563

Ran dmesg command, thank you @Matias Valdenegro for suggesting it

[4550163.834761] Out of memory: Kill process 21996 (python) score 972 or sacrifice child
[4550163.836103] Killed process 21996 (python) total-vm:415564288kB, anon-rss:388981876kB, file-rss:1124kB, shmem-rss:4kB
Community
  • 1
  • 1
krinker
  • 1,072
  • 1
  • 9
  • 23
  • 1
    With tensorflow it might also be exceeding other limits, for example amount of thread local storage entries. – scrutari Jan 11 '19 at 21:27
  • Note two things: you seem to have 233673, not just 10000 images, and that you don't mention how much RAM you have, but seems its not enough. Killed indicates that the linux kernel OOM killer was triggered, the dmesg command after seeing this error will tell you more. Its likely that you don't have enough RAM to load this dataset into RAM. You don't even have to load all the dataset in RAM, you can just use batching. – Dr. Snoopy Jan 12 '19 at 01:58
  • You are right, I have 233673 images that I would like to process, but I was going to batch it like you said to 10k batches, yet ran into this issue before hitting the limit. Hence the post. I just updated my post to include memory information per your comment – krinker Jan 12 '19 at 02:20
  • I guess I need to do some investigation on how to train TL model in batches properly so I won't hit such memory constraint. I would really appreciate any references if someone is willing to share! – krinker Jan 12 '19 at 02:27

2 Answers2

0

sys.getsizeof(x) returns the size of the list structure as such, not of its items. Your actual list is too big.

l=[0]
sys.getsizeof(l)
#72
l[0]=list(range(1000000))
sys.getsizeof(l)
#72
DYZ
  • 55,249
  • 10
  • 64
  • 93
  • Do you mean list size as in 8k elements? According to this I still have some room to grow https://stackoverflow.com/questions/855191/how-big-can-a-python-array-get "Therefore the maximum size of a python list on a 32 bit system is 536,870,912 elements" Do you know how to get the whole size? I do think while I am ok on the list size, the elements in that list are way too big. So that might be the issue. – krinker Jan 12 '19 at 01:10
  • It is the images that take the space, not the list. You cannot load so many images. – DYZ Jan 12 '19 at 04:00
0

In most of the cases, it is caused by excessive memory usage or related with an issue within multiprocessing.

AaronDT
  • 3,940
  • 8
  • 31
  • 71