6

I've created virtual notebook on Paperspace cloud infrastructure with Tensorflow GPU P5000 virtual instance on the backend. When i am starting to train my network, it woks 2x SLOWER than on my MacBook Pro with pure CPU runtime engine. How could i ensure that Keras NN is using GPU instead of CPU during training process?

Please find my code below:

from tensorflow.contrib.keras.api.keras.models import Sequential
from tensorflow.contrib.keras.api.keras.layers import Dense
from tensorflow.contrib.keras.api.keras.layers import Dropout
from tensorflow.contrib.keras.api.keras import utils as np_utils
import numpy as np
import pandas as pd

# Read data
pddata= pd.read_csv('data/data.csv', delimiter=';')

# Helper function (prepare & test data)
def split_to_train_test (data):
    trainLenght = len(data) - len(data)//10

    trainData = data.loc[:trainLenght].sample(frac=1).reset_index(drop=True)
    testData = data.loc[trainLenght+1:].sample(frac=1).reset_index(drop=True)

    trainLabels = trainData.loc[:,"Label"].as_matrix()
    testLabels = testData.loc[:,"Label"].as_matrix()

    trainData = trainData.loc[:,"Feature 0":].as_matrix()
    testData  = testData.loc[:,"Feature 0":].as_matrix()

    return (trainData, testData, trainLabels, testLabels)

# prepare train & test data
(X_train, X_test, y_train, y_test) = split_to_train_test (pddata)

# Convert labels to one-hot notation
Y_train = np_utils.to_categorical(y_train, 3)
Y_test  = np_utils.to_categorical(y_test, 3)

# Define model in Keras
def create_model(init):
    model = Sequential()
    model.add(Dense(101, input_shape=(101,), kernel_initializer=init, activation='tanh'))
    model.add(Dense(101, kernel_initializer=init, activation='tanh'))
    model.add(Dense(101, kernel_initializer=init, activation='tanh'))
    model.add(Dense(101, kernel_initializer=init, activation='tanh'))
    model.add(Dense(3, kernel_initializer=init, activation='softmax'))
    return model

# Train the model
uniform_model = create_model("glorot_normal")
uniform_model.compile(loss='categorical_crossentropy', optimizer='sgd', metrics=['accuracy'])
uniform_model.fit(X_train, Y_train, batch_size=1, epochs=300, verbose=1, validation_data=(X_test, Y_test)) 
Yury Kochubeev
  • 177
  • 1
  • 1
  • 12
  • Possible duplicate of [How to tell if tensorflow is using gpu acceleration from inside python shell?](https://stackoverflow.com/questions/38009682/how-to-tell-if-tensorflow-is-using-gpu-acceleration-from-inside-python-shell) – Emmet B Apr 21 '18 at 12:48
  • Not sure if the best way, but create a HUGE batch and train with it. If it brings an OOM error, it's GPU, if it freezes your machine it's CPU – Daniel Möller Apr 21 '18 at 12:54
  • Another thing you can try is to force a GPU device with: `with tf.device('/gpu:0'):` before declaring your model. – Daniel Möller Apr 21 '18 at 12:56
  • The same behaviour - slower execution, when i create batch_size=32 or even 64. Two times lower than on the MacBook pro with the same settings, on pure CPU – Yury Kochubeev Apr 21 '18 at 13:00
  • changed code to run with `with tf.device('/gpu:0'):` but execution time is still very slow in comparison with my MacBook pro... – Yury Kochubeev Apr 21 '18 at 13:12

3 Answers3

6

You need to run your network with log_device_placement = True set in the TensorFlow session (the line before the last in the sample code below.) Interestingly enough, if you set that in a session, it will still apply when Keras does the fitting. So this code below (tested) does output the placement for each tensor. Please note, I've short-circuited the data reading because your data wan't available, so I'm just running the network with random data. The code this way is self-contained and runnable by anyone. Another note: if you run this from Jupyter Notebook, the output of the log_device_placement will go to the terminal where Jupyter Notebook was started, not the notebook cell's output.

from tensorflow.contrib.keras.api.keras.models import Sequential
from tensorflow.contrib.keras.api.keras.layers import Dense
from tensorflow.contrib.keras.api.keras.layers import Dropout
from tensorflow.contrib.keras.api.keras import utils as np_utils
import numpy as np
import pandas as pd
import tensorflow as tf

# Read data
#pddata=pd.read_csv('data/data.csv', delimiter=';')
pddata = "foobar"

# Helper function (prepare & test data)
def split_to_train_test (data):

    return (
        np.random.uniform( size = ( 100, 101 ) ),
        np.random.uniform( size = ( 100, 101 ) ),
        np.random.randint( 0, size = ( 100 ), high = 3 ),
        np.random.randint( 0, size = ( 100 ), high = 3 )
    )

    trainLenght = len(data) - len(data)//10

    trainData = data.loc[:trainLenght].sample(frac=1).reset_index(drop=True)
    testData = data.loc[trainLenght+1:].sample(frac=1).reset_index(drop=True)

    trainLabels = trainData.loc[:,"Label"].as_matrix()
    testLabels = testData.loc[:,"Label"].as_matrix()

    trainData = trainData.loc[:,"Feature 0":].as_matrix()
    testData  = testData.loc[:,"Feature 0":].as_matrix()

    return (trainData, testData, trainLabels, testLabels)

# prepare train & test data
(X_train, X_test, y_train, y_test) = split_to_train_test (pddata)

# Convert labels to one-hot notation
Y_train = np_utils.to_categorical(y_train, 3)
Y_test  = np_utils.to_categorical(y_test, 3)

# Define model in Keras
def create_model(init):
    model = Sequential()
    model.add(Dense(101, input_shape=(101,), kernel_initializer=init, activation='tanh'))
    model.add(Dense(101, kernel_initializer=init, activation='tanh'))
    model.add(Dense(101, kernel_initializer=init, activation='tanh'))
    model.add(Dense(101, kernel_initializer=init, activation='tanh'))
    model.add(Dense(3, kernel_initializer=init, activation='softmax'))
    return model

# Train the model
uniform_model = create_model("glorot_normal")
uniform_model.compile(loss='categorical_crossentropy', optimizer='sgd', metrics=['accuracy'])
with tf.Session( config = tf.ConfigProto( log_device_placement = True ) ):
    uniform_model.fit(X_train, Y_train, batch_size=1, epochs=300, verbose=1, validation_data=(X_test, Y_test)) 

Terminal output (partial, it was way too long):

...
VarIsInitializedOp_13: (VarIsInitializedOp): /job:localhost/replica:0/task:0/device:GPU:0
2018-04-21 21:54:33.485870: I tensorflow/core/common_runtime/placer.cc:884]
VarIsInitializedOp_13: (VarIsInitializedOp)/job:localhost/replica:0/task:0/device:GPU:0
training/SGD/mul_18/ReadVariableOp: (ReadVariableOp): /job:localhost/replica:0/task:0/device:GPU:0
2018-04-21 21:54:33.485895: I tensorflow/core/common_runtime/placer.cc:884]
training/SGD/mul_18/ReadVariableOp: (ReadVariableOp)/job:localhost/replica:0/task:0/device:GPU:0
training/SGD/Variable_9/Read/ReadVariableOp: (ReadVariableOp): /job:localhost/replica:0/task:0/device:GPU:0
2018-04-21 21:54:33.485903: I tensorflow/core/common_runtime/placer.cc:884]
training/SGD/Variable_9/Read/ReadVariableOp: (ReadVariableOp)/job:localhost/replica:0/task:0/device:GPU:0
...

Note the GPU:0 at the end of many lines.

Tensorflow manual's relevant page: Using GPU: Logging Device Placement.

Peter Szoldan
  • 4,792
  • 1
  • 14
  • 24
  • Yes, you right - log_device_placement - indicates that my training run on GPU... Strange that on GPU it takes 230 seconds per Epoch, while on the MacBook ше ефлуы only 120 seconds per epoch... – Yury Kochubeev Apr 21 '18 at 22:18
  • Try it on https://colab.research.google.com as well. Make sure to go to "Runtime", "Change runtime type", and set "Hardware Accelerator" to GPU. See if it's any quicker. If yes, then the service you're using is not up to speed... – Peter Szoldan Apr 21 '18 at 22:31
  • 1
    Thank you Peter, my code actually use GPU, but what is strange - GPU execution is slower than CPU... – Yury Kochubeev Apr 23 '18 at 07:36
  • Maybe try it with a different cloud provider than Paperspace, and see if that's any better. Colab is free, by the way. It might be that Paperspace has too many customers and their GPUs are overburdened. – Peter Szoldan Apr 23 '18 at 10:33
1

Put this near the top of your jupyter notebook. Comment out what you don't need.

# confirm TensorFlow sees the GPU
from tensorflow.python.client import device_lib
assert 'GPU' in str(device_lib.list_local_devices())

# confirm Keras sees the GPU (for TensorFlow 1.X + Keras)
from keras import backend
assert len(backend.tensorflow_backend._get_available_gpus()) > 0

# confirm PyTorch sees the GPU
from torch import cuda
assert cuda.is_available()
assert cuda.device_count() > 0
print(cuda.get_device_name(cuda.current_device()))

NOTE: With the release of TensorFlow 2.0, Keras is now included as part of the TF API.

Originally answerwed here.

Paul Williams
  • 3,099
  • 38
  • 34
0

Considering keras is a built-in of tensorflow since version 2.0:

import tensorflow as tf
tf.test.is_built_with_cuda()  
tf.test.is_gpu_available(cuda_only = True)  

NOTE: the latter method may take several minutes to run.

ivan866
  • 554
  • 4
  • 10