The tensor itself you are using is big, but not that big for a 8Gb GPU. 144 * 144 * 144 * 128
is ~380 million, so even with 32-bit items it requires 1.5GiB. I have a GeForce GTX 1070 with 8Gb (same size as you) and here's my Tensorflow experiment:
import numpy as np
import tensorflow as tf
X = tf.placeholder(dtype=tf.int32, shape=(1, 144, 144, 144, 128))
init = tf.global_variables_initializer()
with tf.Session() as session:
session.run(init)
value = session.run([X], feed_dict={X: np.zeros(shape=(1, 144, 144, 144, 128))})
print np.array(value).shape
The output:
name: GeForce GTX 1070
major: 6 minor: 1 memoryClockRate (GHz) 1.7465
pciBusID 0000:01:00.0
Total memory: 7.92GiB
Free memory: 4.14GiB
2017-08-17 20:05:54.312424: I tensorflow/core/common_runtime/gpu/gpu_device.cc:908] DMA: 0
2017-08-17 20:05:54.312430: I tensorflow/core/common_runtime/gpu/gpu_device.cc:918] 0: Y
2017-08-17 20:05:54.312444: I tensorflow/core/common_runtime/gpu/gpu_device.cc:977] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1070, pci bus id: 0000:01:00.0)
(1, 1, 144, 144, 144, 128)
Note that free memory is much lower than 8Gb, because I use 2 UHD monitors. So this might be the first cause in your case: other processes can consume a lot of GPU resources.
Next, you didn't provide your Neural Network architecture, but if you are using, for instance, Deep Convolutional Neural Networks, note that the first layers are consuming a lot of memory for parameters and gradients. You might want to read this helpful page for details. If this is the case, you might need to plug in another GPU and split the graph across all available GPUs (here's how you can do it). There are 12Gb memory GPUs available from NVidia.
Finally, you can always consider reducing the floating precision tf.float64 -> tf.float32 -> tf.float16
for all your variables. This can save 8x memory, which sometimes is just enough to run on a GPU.