Is it possible to add the memory of 2 graphics cards together to run a larger neural network?

Question

If I have a graphics card with 24 gb of ram, can I add a 2nd card, that is exactly the same, to double my memory to 48 gb?

I want to run a large 3D-UNet but I am stopped due to the size of the volumes that I am passing. Will adding a second card allow me to do a larger volume?

**Update: I am running on Linux (Red Hat Enterprise Linux 8). My code works to train on both GPUs.

**Code update:

def get_model(optimizer, loss_metric, metrics, lr=1e-3):
    inputs = Input((sample_width, sample_height, sample_depth, 1))
    with tf.device('/device:gpu:0'): 
        conv1 = Conv3D(32, (3, 3, 3), activation='relu', padding='same')(inputs)
        conv1 = Conv3D(32, (3, 3, 3), activation='relu', padding='same')(conv1)
        pool1 = MaxPooling3D(pool_size=(2, 2, 2))(conv1)
        drop1 = Dropout(0.5)(pool1)
        conv2 = Conv3D(64, (3, 3, 3), activation='relu', padding='same')(drop1)
        conv2 = Conv3D(64, (3, 3, 3), activation='relu', padding='same')(conv2)
        pool2 = MaxPooling3D(pool_size=(2, 2, 2))(conv2)
        drop2 = Dropout(0.5)(pool2)
        conv3 = Conv3D(128, (3, 3, 3), activation='relu', padding='same')(drop2)
        conv3 = Conv3D(128, (3, 3, 3), activation='relu', padding='same')(conv3)
        pool3 = MaxPooling3D(pool_size=(2, 2, 2))(conv3)
        drop3 = Dropout(0.3)(pool3)
        conv4 = Conv3D(256, (3, 3, 3), activation='relu', padding='same')(drop3)
        conv4 = Conv3D(256, (3, 3, 3), activation='relu', padding='same')(conv4)
        pool4 = MaxPooling3D(pool_size=(2, 2, 2))(conv4)
        drop4 = Dropout(0.3)(pool4)
        conv5 = Conv3D(512, (3, 3, 3), activation='relu', padding='same')(drop4)
        conv5 = Conv3D(512, (3, 3, 3), activation='relu', padding='same')(conv5)
    with tf.device('/device:gpu:1'):
        up6 = concatenate([Conv3DTranspose(256, (2, 2, 2), strides=(2, 2, 2), padding='same')(conv5), conv4], axis=4)
        conv6 = Conv3D(256, (3, 3, 3), activation='relu', padding='same')(up6)
        conv6 = Conv3D(256, (3, 3, 3), activation='relu', padding='same')(conv6)
        up7 = concatenate([Conv3DTranspose(128, (2, 2, 2), strides=(2, 2, 2), padding='same')(conv6), conv3], axis=4)
        conv7 = Conv3D(128, (3, 3, 3), activation='relu', padding='same')(up7)
        conv7 = Conv3D(128, (3, 3, 3), activation='relu', padding='same')(conv7)
        up8 = concatenate([Conv3DTranspose(64, (2, 2, 2), strides=(2, 2, 2), padding='same')(conv7), conv2], axis=4)
        conv8 = Conv3D(64, (3, 3, 3), activation='relu', padding='same')(up8)
        conv8 = Conv3D(64, (3, 3, 3), activation='relu', padding='same')(conv8)
        up9 = concatenate([Conv3DTranspose(32, (2, 2, 2), strides=(2, 2, 2), padding='same')(conv8), conv1], axis=4)
        conv9 = Conv3D(32, (3, 3, 3), activation='relu', padding='same')(up9)
        conv9 = Conv3D(32, (3, 3, 3), activation='relu', padding='same')(conv9)
        conv10 = Conv3D(1, (1, 1, 1), activation='sigmoid')(conv9)
    model = Model(inputs=[inputs], outputs=[conv10])    
    model.compile(optimizer=optimizer(lr=lr), loss=loss_metric, metrics=metrics)    
    return model


model = get_model(optimizer=Adam, loss_metric=dice_coef_loss, metrics=[dice_coef], lr=1e-3)
model_checkpoint = ModelCheckpoint('save.model', monitor=observe_var, save_best_only=False, period = 1000)
model.fit(train_x, train_y, batch_size = 1, epochs= 2000, verbose=1, shuffle=True, validation_split=0.2, callbacks=[model_checkpoint])
model.save('final_save.model')

score 1 · Answer 1 · answered Feb 01 '20 at 01:50

1

I believe it is not currently possible to combine multiple GPUs to create a single abstract GPU with the combined memory. However, you can do something similar: split a model across multiple GPUs, which will still have the desired effect of being able to run models larger than any individual GPU's memory.

The issue is that doing this requires manually specifying which parts of the model will run on each device, which can be difficult to do efficiently. I'm also not sure how it can be done with a premade model.

The general code is like so:

with tf.device('/gpu:0'):
    # create half the model

with tf.device('/gpu:1'):
    # create the other half of the model

# combine the two halves

More reading:

answered Feb 01 '20 at 01:50

The Guy with The Hat

10,836
8
57
75

I tried to add `with tf.device()` but I still get the ResourceExhaustedError. I am trying to put the down section of the UNet code on gpu:0 and the up section of the UNet code on gpu:1. I added my code to the original post. Do I need to add something at the end of the model to combine the 2 halves or does it do it automatically? – rzaratx Feb 03 '20 at 18:47
@RicardoZaragoza I don't have any real experience splitting models, so I'm not sure. Are you sure that the model is small enough for each half to fit in each GPU? Google around and see what others are doing, e.g. https://www.tensorflow.org/guide/gpu and https://blog.keras.io/keras-as-a-simplified-interface-to-tensorflow-tutorial.html#multi-gpu-and-distributed-training – The Guy with The Hat Feb 03 '20 at 19:05

score 0 · Answer 2 · answered Apr 24 '23 at 19:23

There are guys running NeoX 20B model on two RTX 3090. Don't know if three years ago it wasn't supported, but now it is - you don't see 48GB in system, but Python/Tensorflow/Torch is able to utilize VRAM of both. I will be trying to put together two different cards - 3090 and 3060, will see if StableLM/NeoX fine-tuning will go smoothly.

https://youtu.be/bAY85Om5O6A

score -1 · Answer 3 · answered Jan 31 '20 at 22:17

-1

The short answer is yes, but in practice it comes down to the software you are using that accesses memory on your behalf. I know very little about these operating systems, but I believe Cuda may be a place to start looking

answered Jan 31 '20 at 22:17

Mikesplace

127
3

Is it possible to add the memory of 2 graphics cards together to run a larger neural network?

3 Answers3