"Python has stopped Working" when training a convolution neural net in tensorflow gpu

Question

This is a tensorflow code I wrote to test convolution neural net with only 1 convolution and pooling layer with only 1 fully connected layer of 512 neurons.

My dataset is of only 2 images: https://i.stack.imgur.com/7J1Bj.jpg and https://i.stack.imgur.com/CadII.jpg

When I train my network windows gives an pop up saying "Python has stoped" (ss: https://i.stack.imgur.com/3nK4u.jpg )

This is my code:

from scipy.misc import imread
import matplotlib.pyplot as plt
import numpy as np
import PIL.Image as Image

base_image = imread("base_image.jpg")
subject_image = imread("subject_image.jpg")

base_image = np.resize(base_image, [1024, 768, 3])
subject_image = np.resize(subject_image, [1024, 768, 3])

images = []

images.append(base_image)
images.append(subject_image)

# hyper parameters

epochs = 10
batch_size = 2
learning_rate = 0.01
n_classes = 2
import tensorflow as tf

# Model

x = tf.placeholder('float', [2, 1024, 768, 3])
y = tf.placeholder('float', [2])

weights = {
            "conv": tf.random_normal([10, 10, 3, 32]),
            "fc": tf.random_normal([-1, 512]), #7*7*64
            "out": tf.Variable(tf.random_normal([512, n_classes]))
}

conv = tf.nn.conv2d(x, filter=weights['conv'], strides=[1, 1, 1, 1], padding="SAME")
conv = tf.nn.max_pool(value=conv, ksize=[1,2,2,1], strides=[1,2,2,1], padding="SAME")

fc = tf.reshape(conv, shape=[2,-1])
fc = tf.nn.relu(tf.matmul(fc, weights['fc']))

output = tf.matmul(fc, weights['out'])

loss = tf.reduce_mean((output - y)**2)

train = tf.train.AdamOptimizer(learning_rate).minimize(loss)

sess = tf.Session()

sess.run(tf.global_variables_initializer())

for i in range(epochs):
    sess.run(train, feed_dict={x: images, y: [0, 1]})
    print(i)

Output:

2017-07-09 02:29:43.688699: W c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\35\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE instructions, but these are available on your machine and could speed up CPU computations.
2017-07-09 02:29:43.689131: W c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\35\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE2 instructions, but these are available on your machine and could speed up CPU computations.
2017-07-09 02:29:43.689504: W c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\35\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE3 instructions, but these are available on your machine and could speed up CPU computations.
2017-07-09 02:29:43.689998: W c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\35\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-07-09 02:29:43.690380: W c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\35\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-07-09 02:29:43.690646: W c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\35\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-07-09 02:29:43.691117: W c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\35\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2017-07-09 02:29:43.691436: W c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\35\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
2017-07-09 02:29:44.197766: I c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\35\tensorflow\core\common_runtime\gpu\gpu_device.cc:940] Found device 0 with properties: 
name: GeForce GTX 960M
major: 5 minor: 0 memoryClockRate (GHz) 1.176
pciBusID 0000:01:00.0
Total memory: 4.00GiB
Free memory: 3.35GiB
2017-07-09 02:29:44.198207: I c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\35\tensorflow\core\common_runtime\gpu\gpu_device.cc:961] DMA: 0 
2017-07-09 02:29:44.198391: I c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\35\tensorflow\core\common_runtime\gpu\gpu_device.cc:971] 0:   Y 
2017-07-09 02:29:44.198643: I c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\35\tensorflow\core\common_runtime\gpu\gpu_device.cc:1030] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 960M, pci bus id: 0000:01:00.0)
2017-07-09 02:29:44.881448: W c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\35\tensorflow\core\framework\op_kernel.cc:1158] Invalid argument: Dimension -1 must be >= 0
2017-07-09 02:29:44.881875: W c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\35\tensorflow\core\framework\op_kernel.cc:1158] Invalid argument: Dimension -1 must be >= 0
     [[Node: random_normal_1/RandomStandardNormal = RandomStandardNormal[T=DT_INT32, dtype=DT_FLOAT, seed=0, seed2=0, _device="/job:localhost/replica:0/task:0/gpu:0"](random_normal_1/shape)]]
2017-07-09 02:29:44.882604: W c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\35\tensorflow\core\framework\op_kernel.cc:1158] Invalid argument: Dimension -1 must be >= 0
     [[Node: random_normal_1/RandomStandardNormal = RandomStandardNormal[T=DT_INT32, dtype=DT_FLOAT, seed=0, seed2=0, _device="/job:localhost/replica:0/task:0/gpu:0"](random_normal_1/shape)]]
2017-07-09 02:29:45.362917: E c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\35\tensorflow\stream_executor\cuda\cuda_dnn.cc:352] Loaded runtime CuDNN library: 6021 (compatibility version 6000) but source was compiled with 5105 (compatibility version 5100).  If using a binary install, upgrade your CuDNN library to match.  If building from sources, make sure the library loaded at runtime matches a compatible version specified during compile configuration.
2017-07-09 02:29:45.364154: F c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\35\tensorflow\core\kernels\conv_ops.cc:671] Check failed: stream->parent()->GetConvolveAlgorithms(&algorithms) 
[Finished in 22.7s with exit code 3221226505]
[shell_cmd: python -u "E:\workspace_py\convolotion_neural_net.py"]
[dir: E:\workspace_py]
[path: C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\bin;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\libnvvp;E:\Program Files\Python 3.5\Scripts\;E:\Program Files\Python 3.5\;C:\ProgramData\Oracle\Java\javapath;C:\Windows\system32;C:\Windows;C:\Windows\System32\Wbem;C:\Windows\System32\WindowsPowerShell\v1.0\;C:\Program Files (x86)\NVIDIA Corporation\PhysX\Common;C:\Python27\Scripts;C:\Program Files\Microsoft SQL Server\130\Tools\Binn\;E:\Program Files\MATLAB\runtime\win64;E:\Program Files\MATLAB\bin;E:\Program Files\MATLAB\polyspace\bin;C:\WINDOWS\system32;C:\WINDOWS;C:\WINDOWS\System32\Wbem;C:\WINDOWS\System32\WindowsPowerShell\v1.0\;C:\ProgramData\Oracle\Java\javapath;C:\Windows\system32;C:\Windows;C:\Windows\System32\Wbem;C:\Windows\System32\WindowsPowerShell\v1.0\;C:\Program Files (x86)\NVIDIA Corporation\PhysX\Common;C:\Python27\Scripts;C:\Program Files\Microsoft SQL Server\130\Tools\Binn\;E:\Program Files\MATLAB\runtime\win64;E:\Program Files\MATLAB\bin;E:\Program Files\MATLAB\polyspace\bin;C:\WINDOWS\system32;C:\WINDOWS;C:\WINDOWS\System32\Wbem;C:\WINDOWS\System32\WindowsPowerShell\v1.0\;C:\Users\guita\AppData\Local\Microsoft\WindowsApps;E:\Program Files\Python27\Scipts;e:\Program Files (x86)\Microsoft VS Code\bin;C:\Users\guita\AppData\Local\atom\bin]

My pc specs (Lenovo Y50): Nvidia GTX 960m 4 GB memory, Intel I7 4th gen, 8 GB RAM

Python 3.5 + Tensorflow with GPU

score 1 · Accepted Answer · answered Jul 09 '17 at 13:30

1

So I managed to fix it on my own. This happened because my cuDNN version was 6.0 but tensorflow works best 5.1. Also there's some minute errors in the code but that irrelevant :p

answered Jul 09 '17 at 13:30

ParmuTownley

957
2
14
34

score 0 · Answer 2 · answered Jul 09 '17 at 11:51

0

Your description 'smells' of lack of resources, most probably, memory.

If this happens every time you run the code, then the problem is probably in your code.
I suggest you profile its memory consumption with tools like memory_profiler or heapy.

If it happens only occasionally, then some other process, which runs concurrently with your Python process is consuming so many resources, that not enough is left for your Python process.

answered Jul 09 '17 at 11:51

boardrider

5,882
7
49
86

I managed to fix it. It was because of the cuDNN version. I've posted the answer for that. – ParmuTownley Jul 09 '17 at 13:31

"Python has stopped Working" when training a convolution neural net in tensorflow gpu

2 Answers2