2

If I have a graphics card with 24 gb of ram, can I add a 2nd card, that is exactly the same, to double my memory to 48 gb?

I want to run a large 3D-UNet but I am stopped due to the size of the volumes that I am passing. Will adding a second card allow me to do a larger volume?

**Update: I am running on Linux (Red Hat Enterprise Linux 8). My code works to train on both GPUs.

**Code update:

def get_model(optimizer, loss_metric, metrics, lr=1e-3):
    inputs = Input((sample_width, sample_height, sample_depth, 1))
    with tf.device('/device:gpu:0'): 
        conv1 = Conv3D(32, (3, 3, 3), activation='relu', padding='same')(inputs)
        conv1 = Conv3D(32, (3, 3, 3), activation='relu', padding='same')(conv1)
        pool1 = MaxPooling3D(pool_size=(2, 2, 2))(conv1)
        drop1 = Dropout(0.5)(pool1)
        conv2 = Conv3D(64, (3, 3, 3), activation='relu', padding='same')(drop1)
        conv2 = Conv3D(64, (3, 3, 3), activation='relu', padding='same')(conv2)
        pool2 = MaxPooling3D(pool_size=(2, 2, 2))(conv2)
        drop2 = Dropout(0.5)(pool2)
        conv3 = Conv3D(128, (3, 3, 3), activation='relu', padding='same')(drop2)
        conv3 = Conv3D(128, (3, 3, 3), activation='relu', padding='same')(conv3)
        pool3 = MaxPooling3D(pool_size=(2, 2, 2))(conv3)
        drop3 = Dropout(0.3)(pool3)
        conv4 = Conv3D(256, (3, 3, 3), activation='relu', padding='same')(drop3)
        conv4 = Conv3D(256, (3, 3, 3), activation='relu', padding='same')(conv4)
        pool4 = MaxPooling3D(pool_size=(2, 2, 2))(conv4)
        drop4 = Dropout(0.3)(pool4)
        conv5 = Conv3D(512, (3, 3, 3), activation='relu', padding='same')(drop4)
        conv5 = Conv3D(512, (3, 3, 3), activation='relu', padding='same')(conv5)
    with tf.device('/device:gpu:1'):
        up6 = concatenate([Conv3DTranspose(256, (2, 2, 2), strides=(2, 2, 2), padding='same')(conv5), conv4], axis=4)
        conv6 = Conv3D(256, (3, 3, 3), activation='relu', padding='same')(up6)
        conv6 = Conv3D(256, (3, 3, 3), activation='relu', padding='same')(conv6)
        up7 = concatenate([Conv3DTranspose(128, (2, 2, 2), strides=(2, 2, 2), padding='same')(conv6), conv3], axis=4)
        conv7 = Conv3D(128, (3, 3, 3), activation='relu', padding='same')(up7)
        conv7 = Conv3D(128, (3, 3, 3), activation='relu', padding='same')(conv7)
        up8 = concatenate([Conv3DTranspose(64, (2, 2, 2), strides=(2, 2, 2), padding='same')(conv7), conv2], axis=4)
        conv8 = Conv3D(64, (3, 3, 3), activation='relu', padding='same')(up8)
        conv8 = Conv3D(64, (3, 3, 3), activation='relu', padding='same')(conv8)
        up9 = concatenate([Conv3DTranspose(32, (2, 2, 2), strides=(2, 2, 2), padding='same')(conv8), conv1], axis=4)
        conv9 = Conv3D(32, (3, 3, 3), activation='relu', padding='same')(up9)
        conv9 = Conv3D(32, (3, 3, 3), activation='relu', padding='same')(conv9)
        conv10 = Conv3D(1, (1, 1, 1), activation='sigmoid')(conv9)
    model = Model(inputs=[inputs], outputs=[conv10])    
    model.compile(optimizer=optimizer(lr=lr), loss=loss_metric, metrics=metrics)    
    return model


model = get_model(optimizer=Adam, loss_metric=dice_coef_loss, metrics=[dice_coef], lr=1e-3)
model_checkpoint = ModelCheckpoint('save.model', monitor=observe_var, save_best_only=False, period = 1000)
model.fit(train_x, train_y, batch_size = 1, epochs= 2000, verbose=1, shuffle=True, validation_split=0.2, callbacks=[model_checkpoint])
model.save('final_save.model')
Nick ODell
  • 15,465
  • 3
  • 32
  • 66
rzaratx
  • 756
  • 3
  • 9
  • 29

3 Answers3

1

I believe it is not currently possible to combine multiple GPUs to create a single abstract GPU with the combined memory. However, you can do something similar: split a model across multiple GPUs, which will still have the desired effect of being able to run models larger than any individual GPU's memory.

The issue is that doing this requires manually specifying which parts of the model will run on each device, which can be difficult to do efficiently. I'm also not sure how it can be done with a premade model.

The general code is like so:

with tf.device('/gpu:0'):
    # create half the model

with tf.device('/gpu:1'):
    # create the other half of the model

# combine the two halves

More reading:

The Guy with The Hat
  • 10,836
  • 8
  • 57
  • 75
  • I tried to add `with tf.device()` but I still get the ResourceExhaustedError. I am trying to put the down section of the UNet code on gpu:0 and the up section of the UNet code on gpu:1. I added my code to the original post. Do I need to add something at the end of the model to combine the 2 halves or does it do it automatically? – rzaratx Feb 03 '20 at 18:47
  • @RicardoZaragoza I don't have any real experience splitting models, so I'm not sure. Are you sure that the model is small enough for each half to fit in each GPU? Google around and see what others are doing, e.g. https://www.tensorflow.org/guide/gpu and https://blog.keras.io/keras-as-a-simplified-interface-to-tensorflow-tutorial.html#multi-gpu-and-distributed-training – The Guy with The Hat Feb 03 '20 at 19:05
0

There are guys running NeoX 20B model on two RTX 3090. Don't know if three years ago it wasn't supported, but now it is - you don't see 48GB in system, but Python/Tensorflow/Torch is able to utilize VRAM of both. I will be trying to put together two different cards - 3090 and 3060, will see if StableLM/NeoX fine-tuning will go smoothly.

https://youtu.be/bAY85Om5O6A

Jaroslav Štreit
  • 415
  • 1
  • 5
  • 16
-1

The short answer is yes, but in practice it comes down to the software you are using that accesses memory on your behalf. I know very little about these operating systems, but I believe Cuda may be a place to start looking

Mikesplace
  • 127
  • 3