1

Assuming my model uses only one GPU but Virtual Machine has 4.

How to leverage all GPUs for this code?

channel_1_range = [8, 16, 32, 64]
channel_2_range = [8, 16, 32, 64]
kernel_size_1_range = [3, 5, 7]
kernel_size_2_range = [3, 5, 7]
max_count = 40
for count in range(max_count):
    reg = 10**np.random.uniform(-3, 0)
    learning_rate = 10**np.random.uniform(-6, -3)
    channel_1 = channel_1_range[np.random.randint(low=0, high=len(channel_1_range))]
    channel_2 = channel_2_range[np.random.randint(low=0, high=len(channel_2_range))]
    kernel_size_1 = kernel_size_1_range[np.random.randint(low=0, high=len(kernel_size_1_range))]
    kernel_size_2 = kernel_size_2_range[np.random.randint(low=0, high=len(kernel_size_2_range))]

    model = ThreeLayerConvNet(in_channel=3, channel_1=channel_1, kernel_size_1=kernel_size_1, \
        channel_2=channel_2, kernel_size_2=kernel_size_2, num_classes=10)
    optimizer = optim.Adam(model.parameters(), lr=learning_rate)
    engine = Engine(loader_train=loader_train, loader_val=loader_val, device=device, dtype=dtype, print_every=100, \
        verbose=False)
    engine.train(model, optimizer, epochs=1, reg=reg)

    print("Reg: {0:.2E}, LR: {1:.2E}, Ch_1: {2:2} [{4}], Ch_2: {3:2} [{5}], Acc: {6:.2f} [{7:.2f}], {8:.2f} secs". \
         format(reg, learning_rate, channel_1, channel_2, kernel_size_1, kernel_size_2, \
               engine.accuracy, engine.accuracy_train, engine.duration))

One option is to move this to standalone console app, start N instances (N == number of GPUs) and aggregate results (one output file).

Is it possible to do it directly in Python so I could continue to use jupyter notebook?

ZakiMa
  • 5,637
  • 1
  • 24
  • 48

1 Answers1

2

In pytorch you can distribute your models on different GPUs. I think in your case it's the device parameter that allows you to specify the actual GPU:

device1 = torch.device('cuda:0')
device2 = torch.device('cuda:1')
             .
             .
             .
devicen = torch.device('cuda:n')

I don't remember the exact details but if my memory serves me well, you might need to make your code non-blocking by using threading or multiprocessing (better go with multiprocessing to be sure, the GIL might cause you some problems otherwise if you fully utilize your process). In your case that would mean to parallelise your for loop. For instance by having Queue containing all models and then spawning threads/ processes, allowing you to consume them (where the number of spawned processed, working on the queue, correspond to a GPU each).

So to answer your question, yes you can do it in pure Python (I did a while back, so I'm 100% positive). You can even let one GPU process multiple models (but make sure to calculate your VRAM correctly beforehand). Whether it's actually worth it, compared to just starting multiple jobs is up to you though.

As a little sidenote, if you run it as 'standalone' script, it might still use the same GPU if the GPU number isn't automatically adjusted, otherwise PyTorch might try using DataParallel distribution...

meow
  • 2,062
  • 2
  • 17
  • 27
  • Yes, understand how to point a model to a particular device (console app can take it as input parameter). Re: DataParallel - I thought it was for running one model on multiple GPUs? I think my question here is more about multiprocessing... – ZakiMa Jul 24 '19 at 18:37
  • 1
    If you can give it as an input parameter it is fine. Just added it so you are aware, that often the default behavior is to use DataParallel, which, as you said, does parallelize a single models based on the data (minibatches) and wouldn't do what you intended. – meow Jul 24 '19 at 18:43
  • Added some points, to give you a more concrete idea, if I find the time, I'll add some more code. – meow Jul 24 '19 at 18:50