Unexpected error when loading the model: problem in predictor - ModuleNotFoundError: No module named 'torchvision'

Question

I've been trying to deploy my model to the AI platform for Prediction through the console on my vm instance, but I've gotten the error "(gcloud.beta.ai-platform.versions.create) Create Version failed. Bad model detected with error: "Failed to load model: Unexpected error when loading the model: problem in predictor - ModuleNotFoundError: No module named 'torchvision' (Error code: 0)"

I need to include both torch and torchvision. I followed the steps in this question Cannot deploy trained model to Google Cloud Ai-Platform with custom prediction routine: Model requires more memory than allowed, but I couldn't fetch the files pointed to by user gogasca. I tried downloading this .whl file from Pytorch website and uploading it to my cloud storage but got the same error that there is no module torchvision, even though this version is supposed to include both torch and torchvision. Also tried using Cloud AI compatible packages here, but they don't include torchvision.

I tried pointing to two separate .whl files for torch and torchvision in the --package-uris arguments, those point to files in my cloud storage, but then I got the error that the memory capacity was exceeded. This is strange, because collectively their size is around 130Mb. An example of my command that resulted in absence of torchvision looked like this:

gcloud beta ai-platform versions create version_1 \
  --model online_pred_1 \
  --runtime-version 1.15 \
  --python-version 3.7 \
  --origin gs://BUCKET/model-dir \
  --package-uris gs://BUCKET/staging-dir/my_package-0.1.tar.gz,gs://BUCKET/torchvision-dir/torch-1.4.0+cpu-cp37-cp37m-linux_x86_64.whl \
  --prediction-class predictor.MyPredictor

I've tried pointing to different combinations of .whl files that I obtained from different sources, but got either the no module error or not enough memory. I don't understand how the modules interact in this case and why the compiler thinks there is no such module. How can I resolve this? Or alternatively, how can I compile a package myself that include both torch and torchvision. Can you please give detailed answers because I'm not very familiar with package management and bash scripting.

Here's the code I used, torch_model.py:

from torch import nn


class EthnicityClassifier44(nn.Module):
    def __init__(self, num_classes=2):
        super().__init__()
        self.conv1 = nn.Conv2d(3, 32, kernel_size=7, stride=1, padding=3)
        self.maxpool1 = nn.MaxPool2d(kernel_size=2, stride=2)
        self.conv22 = nn.Conv2d(32, 32, kernel_size=3, stride=1, padding=1)
        self.maxpool2 = nn.MaxPool2d(kernel_size=2, stride=2)
        self.conv3 = nn.Conv2d(32, 64, kernel_size=3, stride=1, padding=1)
        self.maxpool3 = nn.MaxPool2d(kernel_size=2, stride=2)
        self.conv4 = nn.Conv2d(64, 128, kernel_size=3, stride=1, padding=1)
        self.maxpool4 = nn.MaxPool2d(kernel_size=2, stride=2)
        self.relu = nn.ReLU(inplace=False)
        self.fc1 = nn.Linear(8*8*128, 128)
        self.fc2 = nn.Linear(128, 128)
        self.fc4 = nn.Linear(128, num_classes)


    def forward(self, x):
        x = self.relu(self.conv1(x))
        x = self.maxpool1(x)
        x = self.relu(self.conv22(x))
        x = self.maxpool2(x)
        x = self.maxpool3(self.relu(self.conv3(x)))
        x = self.maxpool4(self.relu(self.conv4(x)))
        x = self.relu(self.fc1(x.view(x.shape[0], -1)))
        x = self.relu(self.fc2(x))
        x = self.fc4(x)
        return x

This is predictor_py:

from facenet_pytorch import MTCNN, InceptionResnetV1, extract_face
import torch
import torchvision
from torchvision import transforms
from torch.nn import functional as F
from PIL import Image
from sklearn.externals import joblib
import numpy as np
import os
import torch_model


class MyPredictor(object):

    import torch
    import torchvision

    def __init__(self, model, preprocessor, device):
        """Stores artifacts for prediction. Only initialized via `from_path`.
        """
        self._resnet = model
        self._mtcnn_mult = preprocessor
        self._device = device
        self.get_std_tensor = transforms.Compose([
            np.float32,
            np.uint8,
            transforms.ToTensor(),
        ])
        self.tensor2pil = transforms.ToPILImage(mode='RGB')
        self.trans_resnet = transforms.Compose([
            transforms.Resize((100, 100)),
            np.float32,
            transforms.ToTensor()
        ])

    def predict(self, instances, **kwargs):

        pil_transform = transforms.Resize((512, 512))

        imarr = np.asarray(instances)
        pil_im = Image.fromarray(imarr)
        image = pil_im.convert('RGB')
        pil_im_512 = pil_transform(image)

        boxes, _ = self._mtcnn_mult(pil_im_512)
        box = boxes[0]

        face_tensor = extract_face(pil_im_512, box, margin=40)
        std_tensor = self.get_std_tensor(face_tensor.permute(1, 2, 0))
        cropped_pil_im = self.tensor2pil(std_tensor)

        face_tensor = self.trans_resnet(cropped_pil_im)
        face_tensor4d = face_tensor.unsqueeze(0)
        face_tensor4d = face_tensor4d.to(self._device)

        prediction = self._resnet(face_tensor4d)
        preds = F.softmax(prediction, dim=1).detach().numpy().reshape(-1)
        print('probability of (class1, class2) = ({:.4f}, {:.4f})'.format(preds[0], preds[1]))

        return preds.tolist()

    @classmethod
    def from_path(cls, model_dir):
        import torch
        import torchvision
        import torch_model

        model_path = os.path.join(model_dir, 'class44_M40RefinedExtra_bin_no_norm_7860.joblib')
        classifier = joblib.load(model_path)

        mtcnn_path = os.path.join(model_dir, 'mtcnn_mult.joblib')
        mtcnn_mult = joblib.load(mtcnn_path)

        device_path = os.path.join(model_dir, 'device_cpu.joblib')
        device = joblib.load(device_path)

        return cls(classifier, mtcnn_mult, device)

And setup.py:

from setuptools import setup

REQUIRED_PACKAGES = ['opencv-python-headless', 'facenet-pytorch']

setup(
 name="my_package",
 version="0.1",
 include_package_data=True,
 scripts=["predictor.py", "torch_model.py"],
 install_requires=REQUIRED_PACKAGES
)

*"this version is supposed to include both torch and torchvision"* - no, that is just `torch` on its own. `torchvision` is rather small (even the GPU version is only about 20MB), so the PyPI version should be fine. But I don't really see where `torchvision` would be installed. It's not in your `REQUIRED_PACKAGES` and neither is it in the [requirements of `facenet-pytorch`](https://github.com/timesler/facenet-pytorch/blob/master/setup.py#L38-L41). Could you try adding `'torchvision==0.5.0`' to `REQUIRED_PACKAGES`? 0.5.0 because that's the version for PyTorch 1.4.0. — Michael Jungo, May 21 '20 at 16:08
@MichaelJungo, thanks for your reply! I've tried that before and got an error stating that there isn't enough space ```(gcloud.beta.ai-platform.versions.create) Create Version failed. Bad model detected with error: Model requires more memory than allowed. Please try to decrease the model size an d re-deploy.```. I tried it again and got the same error, which doesn't make sense, because indeed ```tochvision``` is pretty small. — dream_variable, May 21 '20 at 17:43
@MichaelJungo Furthermore, i find it strange that ```facenet-pytorch``` doesn't have ```torchvision``` in the setup file, because it's used in ```detect_face``` module [here](https://github.com/timesler/facenet-pytorch/blob/master/models/utils/detect_face.py) — dream_variable, May 21 '20 at 17:44
`torchvision` depends on `torch`, so it will automatically try to install it as well. Then you need to make sure that it fetches the CPU version of `torch`, you could try to also specify `torch==1.4.0+cpu` as a requirement, so that `torchvision` doesn't try to get the regular `torch`, which would be the GPU version. — Michael Jungo, May 21 '20 at 17:53
@MichaelJungo I did that passing torch and torchvision as separate entries to the ```REQUIRED_PACKAGES```. I also omitted the link to .whl package containing torch on my cloud storage in ```--package-uris ``` argument. Got the following error, which is new: ```Bad model detected with error: "Failed to load model: User-provided package my_package-0.1.tar.gz failed to ins tall: Command '['python-default', '-m', 'pip', 'install', '--target=/tmp/custom_lib', '--no-cache-dir', '-b', '/tmp/pip_builds', '/tmp/custom_code/my_package-0.1.tar.gz']' returned non -zero exit status 1. (Error code: 0)``` — dream_variable, May 21 '20 at 18:07
That looks like `my_package-0.1.tar.gz` does not exist. I don't think that you need to specify `gs://BUCKET/staging-dir/my_package-0.1.tar.gz` in `--package-uris`, that should only be for packages your code depends on (like PyTorch's CPU version). After all, you are about to build that package, there is no way to install it before you've even built it. — Michael Jungo, May 21 '20 at 19:12
@MichaelJungo that's what supposed to be included according to the [tutorial](https://cloud.google.com/ai-platform/prediction/docs/custom-prediction-routines) and [documentation](https://cloud.google.com/sdk/gcloud/reference/beta/ai-platform/versions/create). I tried to only include the directory and it doesn't work. If I don't specify that at all, then how is my custom package gonna be installed? — dream_variable, May 22 '20 at 09:49
Oh, yes last time I checked I was looking at the training part, there it can be omitted, but it also has a section if you want to build in yourself, which seems that you are required to do for the deployment. Have you done the steps from [Package your Predictor and its dependencies](https://cloud.google.com/ai-platform/prediction/docs/custom-prediction-routines#predictor-tarball)? You need to build it with `python setup.py sdist --formats=gztar` and then copy it generated archive to `gs://BUCKET/staging-dir/my_package-0.1.tar.gz`. — Michael Jungo, May 22 '20 at 10:14
@MichaelJungo I've found a solution. It was to place in the setup.py file the following: ```REQUIRED_PACKAGES = ['torchvision==0.5.0', 'torch @ https://download.pytorch.org/whl/cpu/torch-1.4.0%2Bcpu-cp37-cp37m-linux_x86_64.whl', 'opencv-python', 'facenet-pytorch']```. I then had a different problem with custom class instantiation, but [this](https://medium.com/@aftaabzia9/serverless-machine-learning-deployment-with-pytorch-and-google-cloud-f89775773b6b) article explains it well. So I was able to successfully deploy my model to the AI Platform — dream_variable, May 23 '20 at 15:16

score 2 · Accepted Answer · answered May 23 '20 at 15:19

The solution was to place the following packages in thsetup.py file for the custom prediction code:

REQUIRED_PACKAGES = ['torchvision==0.5.0', 'torch @ https://download.pytorch.org/whl/cpu/torch-1.4.0%2Bcpu-cp37-cp37m-linux_x86_64.whl', 'opencv-python', 'facenet-pytorch']

I then had a different problem with custom class instantiation, but this article explains it well. So I was able to successfully deploy my model to the AI Platform for prediction.

Unexpected error when loading the model: problem in predictor - ModuleNotFoundError: No module named 'torchvision'

1 Answers1

Linked