I am coming over from Keras to PyTorch and am having a hard time creating graph visualizations of models. In Keras, you can simply call plot_model()
, and that function will use graphviz
to produce a nice graph.
However, in PyTorch it is not so easy. I've seen there are a lot of tools available, such as TorchViz, but they all require that you pass in some input into the model. For example:
- This StackOverflow question (How do I visualize a net in Pytorch?)
suggests using the
torchviz
package like so:
x = torch.zeros(1, 3, 224, 224, dtype=torch.float, requires_grad=False)
out = resnet(x)
make_dot(out)
- The HiddenLayer package (example) also requires dummy input:
model = torchvision.models.vgg16()
g = hl.build_graph(model, torch.zeros([1, 3, 224, 224]))
So my question is: Suppose I download a pre-trained model that has little documentation. How can I determine what is the needed dummy input dimension in order to run these visualization functions?
For example, when I print()
an AlexNet model, what part of the result below tells me that the input should be [1, 3, 224, 224]
?
model = torchvision.models.alexnet(pretrained=True)
print(model)
# AlexNet(
# (features): Sequential(
# (0): Conv2d(3, 64, kernel_size=(11, 11), stride=(4, 4), padding=(2, 2))
# (1): ReLU(inplace=True)
# (2): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
# (3): Conv2d(64, 192, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
# (4): ReLU(inplace=True)
# (5): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
# (6): Conv2d(192, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
# (7): ReLU(inplace=True)
# (8): Conv2d(384, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
# (9): ReLU(inplace=True)
# (10): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
# (11): ReLU(inplace=True)
# (12): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
# )
# (avgpool): AdaptiveAvgPool2d(output_size=(6, 6))
# (classifier): Sequential(
# (0): Dropout(p=0.5, inplace=False)
# (1): Linear(in_features=9216, out_features=4096, bias=True)
# (2): ReLU(inplace=True)
# (3): Dropout(p=0.5, inplace=False)
# (4): Linear(in_features=4096, out_features=4096, bias=True)
# (5): ReLU(inplace=True)
# (6): Linear(in_features=4096, out_features=1000, bias=True)
# )
# )
The first layer says: Conv2d(3, 64, kernel_size=(11, 11), stride=(4, 4), padding=(2, 2))
. How would I know from that fact that I need to create [1, 3, 224, 224]
input?