5

I tried following the tutorial from PyTorch here: https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html#sphx-glr-beginner-blitz-cifar10-tutorial-py.

Full code is here:

import torch
import torchvision
import torchvision.transforms as transforms
import matplotlib.pyplot as plt
import numpy as np
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim


# Loading and normalizing CIFAR10
transform = transforms.Compose(
    [transforms.ToTensor(),
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
                                        download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=4,
                                          shuffle=True, num_workers=2)

testset = torchvision.datasets.CIFAR10(root='./data', train=False,
                                       download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=4,
                                         shuffle=False, num_workers=2)

classes = ('plane', 'car', 'bird', 'cat',
           'deer', 'dog', 'frog', 'horse', 'ship', 'truck')


# Shows training images, DOESN'T WORK

def imshow(img):
    img = img / 2 + 0.5     # unnormalize
    npimg = img.numpy()
    plt.imshow(np.transpose(npimg, (1, 2, 0)))
    plt.show()


# get some random training images
dataiter = iter(trainloader)
images, labels = dataiter.next()

# show images
imshow(torchvision.utils.make_grid(images))
# print labels
print(' '.join('%5s' % classes[labels[j]] for j in range(4)))


# define a convolutional neural network
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 16 * 5 * 5)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x


net = Net()

# Define a loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)

# Train the network
for epoch in range(2):  # loop over the dataset multiple times

    running_loss = 0.0
    
    # DOESN'T WORK
    for i, data in enumerate(trainloader, 0):
        # get the inputs; data is a list of [inputs, labels]
        inputs, labels = data

        # zero the parameter gradients
        optimizer.zero_grad()

        # forward + backward + optimize
        outputs = net(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        # print statistics
        running_loss += loss.item()
        if i % 2000 == 1999:    # print every 2000 mini-batches
            print('[%d, %5d] loss: %.3f' %
                  (epoch + 1, i + 1, running_loss / 2000))
            running_loss = 0.0

print('Finished Training')


# save trained model
PATH = './cifar_net.pth'
torch.save(net.state_dict(), PATH)

# test the network on the test data
dataiter = iter(testloader)
images, labels = dataiter.next()

# print images
dataiter = iter(testloader)
images, labels = dataiter.next()
imshow(torchvision.utils.make_grid(images))
print('GroundTruth: ', ' '.join('%5s' % classes[labels[j]] for j in range(4)))


# load back saved model
net = Net()
net.load_state_dict(torch.load(PATH))

# see what the nueral network thinks these examples above are:
ouputs = net(images)

# index of the highest energy
_, predicted = torch.max(outputs, 1)

print('Predicted: ', ' '.join('%5s' % classes[predicted[j]]
                              for j in range(4)))

# accuracy on the whole dataset
correct = 0
total = 0
with torch.no_grad():
    for data in testloader:
        images, labels = data
        outputs = net(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

print('Accuracy of the network on the 10000 test images: %d %%' % (
    100 * correct / total))

# classes that perfomed well vs classes that didn't perform well
class_correct = list(0. for i in range(10))
class_total = list(0. for i in range(10))
with torch.no_grad():
    for data in testloader:
        images, labels = data
        outputs = net(images)
        _, predicted = torch.max(outputs, 1)
        c = (predicted == labels).squeeze()
        for i in range(4):
            label = labels[i]
            class_correct[label] += c[i].item()
            class_total[label] += 1


for i in range(10):
    print('Accuracy of %5s : %2d %%' % (
        classes[i], 100 * class_correct[i] / class_total[i]))


if __name__ == '__main__':
    torch.multiprocessing.freeze_support()

However I got this issue:

An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.

I'm just trying to run this in a regular python file. When I added

if __name__ == '__main__':
                freeze_support()

to the end of my file, I still get the error.

desertnaut
  • 57,590
  • 26
  • 140
  • 166
  • Does this answer your question? [RuntimeError on windows trying python multiprocessing](https://stackoverflow.com/questions/18204782/runtimeerror-on-windows-trying-python-multiprocessing) – AMC Nov 03 '20 at 03:20
  • I'm running on Mac. I'm not exactly sure what I'm supposed to do however. (This is the first time I've used PyTorch, and I'm somewhat new to Python as well). – What_Is_CoMpUtErScIeNcE Nov 03 '20 at 04:15
  • This seems to be a windows issue. I have it too, not sure how to solve it yet. – Just_Alex Dec 10 '20 at 04:22

4 Answers4

9

To anyone else with this issue, I believe you need to define a main function and run the training there. Then add:

if __name__ == '__main__':
    main()

at the end of the python file.

This fixed the freeze_support() issue for me on a different PyTorch training program.

1

Just set the num_workers parameter equal to 0 for the train and test DataLoader. In code just do this:

trainloader = torch.utils.data.DataLoader(trainset, batch_size=4,
                                          shuffle=True, num_workers=0)

testloader = torch.utils.data.DataLoader(testset, batch_size=4,
                                         shuffle=False, num_workers=0)
melqkiades
  • 445
  • 6
  • 7
  • It would be great if you explain what this parameter is about, especially to novice. Seems that your answer is the best here. – A.Ametov Mar 10 '23 at 15:29
  • I am no expert, however from the PyTorch documentation (https://pytorch.org/docs/stable/data.html) num_workers --> how many subprocesses to use for data loading. 0 means that the data will be loaded in the main process. (default: 0). I think the freeze_support issue is related to a bug from the early implementations of PyTorch that support arm64 architecture (M1, M2 processors). For more info, see this issue: https://github.com/pytorch/pytorch/issues/70344 – melqkiades Mar 21 '23 at 16:36
0

Following worked for me:

  1. Using spawn start method

import torch.multiprocessing as mp

mp.use_start_method('spawn', force=True)

force is essential as it was returning another error that context has already been set

  1. Use main function (if __name__ == '__main__':) at the very first line even before imports (many answers on stackoverflow show that start() and join() method should be in the main and it works well. But I guess I am using several scripts and modules so it is not identifying the proper main so I had to include it in the first line of first file).
devil in the detail
  • 2,905
  • 17
  • 15
0

On MacOS, M1 mac mini, torch version 1.13.1, adding this at the top of the script worked for me, without defining a main:

# after `import torch`:    

import torch.multiprocessing as mp

mp.set_start_method('fork', force=True)
Kevin Newman
  • 141
  • 1
  • 5