I have two computers with the same GPU(GTX 1080), installed the same copy of OS and softwares. But when I run my tensorflow program(an RNN model), the speed are very different. One is about 1.5x faster than the other.
Here are the key specs of the two:
SystemA: Asus Z170-P, i7 6700T, 32GB Ram, GTX 1080.
SystemB: Asus X99 E-WS, i7 5930K, 128G Ram, GTX 1080. (Problem one)
Both are installed with(using the same method):
OS: Ubuntu 16.04
GPU driver version: 378.13
Cuda version: 8.0
cuDNN version: 5.1
Tensorflow: installed using method pip install tensorflow-gpu==1.0.1
Python: Anaconda 3.6
Sample code:
import tensorflow as tf
import numpy as np
from tqdm import trange
h,w = 3000, 2000
steps = 1000
x = tf.placeholder(dtype=tf.float32, shape=[h, w], name='x')
t = tf.constant(np.random.random(size=[w, w]), dtype=tf.float32)
m = tf.matmul(x,t)
x0 = np.random.random(size=[h, w])
sess = tf.Session()
for i in trange(steps):
x0 = sess.run(m, feed_dict={x: x0})
SystemA performs 75 iter/sec and systemB only has 50 iter/sec, yes the poorer one is actually faster.
Key observations:
- SystemB have a much larger page fault while running the program.
- By monitoring the
Volatile GPU-Util
fromnvidia-smi
, systemA stably seat at about 40% while systemB is about 30%.
Things I have tried on systemB:
- Upgrade BIOS to the latest version and reset default settings.
- Call Asus customer service for help.
- Swap GPU card with system A.
- Change PCI-e slot to make sure it running at x16 gen3.
- Inject
LD_PRELOAD="/usr/lib/libtcmalloc.so"
to.bashrc
file.
The main differences of the output of /usr/bin/time -v
are:
# The first value is for systemB and the second is for systemA.
System time (seconds): 7.28 2.95
Percent of CPU this job got: 85% 106%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:22.41 0:14.89
Minor (reclaiming a frame) page faults: 684695 97853
Involuntary context switches: 164 91063
File system inputs: 0 24
File system outputs: 8 0
Can anybody point me to a direction of how to profile/debug this issue? Many thanks in advance!