Currently I'm working on my final year project, which involves in developing a multistream CNN to perform action recognition. However, the final output is relying on the output generated by the independent streams (spatial & temporal). My objective is to make the inference process as efficient as possible, so I wish to make the 2 different stream run simultaneously. By default, it would run the forward function sequentially, thus the execution time will be long.
rgb = network1(input1)
of = network2(input2)
final_output = (rgb + of)/2
return final_output
I have gone through some information about PyTorch multiprocessing, and I have tried some example with torch.multiprocessing.Process, however it seems like the execution time took longer than I was expecting it to be. The codes are shown below.
import torch
import torchvision
import torch.multiprocessing as mp
import time
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
net1 = torchvision.models.quantization.mobilenet_v3_large(pretrained=True,quantize=False)
net2 = torchvision.models.quantization.mobilenet_v3_large(pretrained=True,quantize=False)
if __name__ == "__main__":
inputs = torch.rand(1, 3, 224, 224)
start = time.time()
outputs = net1.forward(inputs)
end = time.time()
print('Time taken for forward prop on 1 stream: (sequentially)',end-start)
start = time.time()
outputs = net1.forward(inputs)
outputs = net2.forward(inputs)
end = time.time()
print('Time taken for forward prop on 2 stream: (sequentially)',end-start)
p1 = mp.Process(target=net1.forward, args=(inputs,))
p2 = mp.Process(target=net2.forward, args=(inputs,))
start = time.time()
p1.start()
p2.start()
p1.join()
p2.join()
end = time.time()
print('Time taken for forward prop on 2 stream: (parallel)',end-start)
and this is the output:
Time taken for forward prop on 1 stream: (sequentially) 0.08776640892028809
Time taken for forward prop on 2 stream: (sequentially) 0.15159368515014648
Time taken for forward prop on 2 stream: (parallel) 3.8684606552124023
It could be seen that the forward prop is performed sequentially, any idea on how could I make the forward propagation for both network to be performed simultaneously?