I'm trying to translate yolov7 model using torch to '.ts' file using the next code:
gc.collect()
torch.cuda.empty_cache()
traced_model = torch.jit.trace(model, torch.empty([10, 3, 768, 1280]).half().to(self.device))
trt_script_module = torch_tensorrt.compile(
traced_model,
inputs = [
torch_tensorrt.Input(
min_shape=[10, 3, 768, 1280],
opt_shape=[10, 3, 768, 1280],
max_shape=[10, 3, 768, 1280],
dtype=torch.half,
)
], enabled_precisions={torch.half}
)
torch.jit.save(trt_script_module, "yolo_trt_script.ts")
But in the torch.jit.trace
instruction it shows the following runtime error:
RuntimeError: CUDA out of memory. Tried to allocate 38.00 MiB (GPU 0; 7.79 GiB total capacity; 6.52 GiB already allocated; 34.69 MiB free; 6.61 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
More details:
- SO: Ubuntu 20.04 LTS
- GPU: NVIDIA GeForce RTX 2070 (8GB)
- Driver Version: 495.29.05
- Torch version: 1.11.0+cu115
- Torch-Tensorrt version: 1.1.0
- CUDA: 11.5
Two questions that arise are:
- How come I could generate the '.onnx' file with a Windows 10 environment in the same computer but can't with Ubuntu (I have also tried to generate .onnx files)?
- Is there a way to optimize the GPU load needed when translating the file? I have searched for a possible answer related to this problem I haven't found anything.
Ways that I have already tried:
- Close other processes
- Upgrade torch version
- Empty cuda cache