TLDR: I want to predict whether a machine will fail based on the most recent set of measurements taken by on-board sensors. But I'm having trouble understanding what the shape of my input tensors should be as well as how to properly flatten my data between the final convolutional layer and the first fully connected layer.
The Data
I have several dozen machines, each with a sensor that takes a measurement at regular intervals. Some machines have already failed, but most have not. The resulting dataset looks something like the example data below, with one row for each machine, showing the 30 most recent sensor measurements as well as a "failure" designation, where 0
indicates that the machine is still operational, and 1
indicates that the machine failed after the measurement taken at time30
.
ID time1 time2 time3 time4 time5 time6 time7 time8 time9 time10 time11 time12 time13 time14 time15 time16 time17 time18 time19 time20 time21 time22 time23 time24 time25 time26 time27 time28 time29 time30 failure
0 1 3.085 1.360 2.351 3.858 5.562 3.709 6.423 9.706 5.521 0.045 5.676 6.045 5.540 8.404 2.701 7.969 2.535 5.096 7.949 5.888 9.250 6.608 1.441 2.066 8.885 6.985 1.310 4.245 9.068 3.283 0
1 2 7.938 9.833 5.776 3.218 0.978 4.164 8.079 7.425 5.554 0.259 5.927 5.168 8.751 8.713 5.651 9.342 0.385 6.623 4.348 9.113 9.230 7.134 4.316 4.725 9.258 4.248 6.497 7.354 7.707 2.527 0
2 3 5.946 0.096 1.972 6.362 9.990 6.702 9.683 5.111 2.273 7.581 0.379 5.571 0.274 9.429 3.572 2.032 0.543 0.467 3.028 1.095 0.529 8.780 4.375 7.544 0.754 5.400 4.943 1.821 1.486 2.492 1
3 4 6.793 9.299 1.522 9.307 0.438 9.999 0.481 6.420 3.881 4.933 7.185 4.176 4.224 7.403 9.101 3.300 3.273 0.556 6.421 5.528 9.262 6.160 1.573 9.299 4.307 0.808 4.270 6.886 3.548 4.889 0
4 5 8.470 5.503 7.420 8.363 3.316 1.047 9.695 3.884 2.010 8.353 1.308 7.733 7.898 3.327 2.737 2.858 2.002 5.483 7.750 4.952 2.435 5.980 6.403 0.985 1.591 8.886 7.586 0.062 6.002 1.144 1
The Problem(s)
I want to use PyTorch to create a 1D convolutional neural network that will predict whether a machine is about to fail based on the 30 most recent sensor measurements. While building my CNN, I've run into a couple of points of confusion:
- Understanding the necessary dimensionality of the input tensors: Should my input tensor have the shape of (num_of_samples, 1, 30), or (num_of_samples, 30, 1)?
- Understanding the flattening step after the final convolutional layer and before the first fully connected layer: Many PyTorch CNN examples I've seen include a flattening/reshaping step where the view function is used. I've included such a step in my code below, but I don't know if I'm using it correctly since I often get
RuntimeError: mat1 and mat2 shapes cannot be multiplied
after this step.
What I Have So Far
After splitting the dataframe data into train, validation, and test sets, I can check the dimensionality:
np.shape(X_train)
# out: (38, 30)
I then add another dimension and convert to arrays:
def extra_layer(arr):
arr = np.reshape(arr, (len(arr), len(arr[0]), 1))
return arr
X_train = extra_layer(X_train)
X_validate = extra_layer(X_validate)
X_test = extra_layer(X_test)
y_train = y_train.astype(int)
y_validate = y_validate.astype(int)
y_test = y_test.astype(int)
np.shape(X_train)
# out: (38, 30, 1)
I then reshape the data so each timepoint measurement is the final dimension:
'''
Reshaping data
'''
reshape = 30
X_train = X_train.reshape(len(X_train), 1, reshape)
X_validate = X_validate.reshape(len(X_validate), 1, reshape)
X_test = X_test.reshape(len(X_test), 1, reshape)
np.shape(X_train)
# out: (38, 1, 30)
Then I zip my arrays together and build my CNN:
'''
Zipping data together and storing in trainloader objects
'''
train_set = list(zip(X_train, y_train))
val_set = list(zip(X_validate, y_validate))
test_set = list(zip(X_test, y_test))
print('Done zipping and converting')
'''
Creating the neural net
'''
class CNN(nn.Module):
def __init__(self):
# Outputs
self.O_1 = 17
self.O_2 = 18
self.O_3 = 32
self.O_4 = 37
# Kernels
self.K_1 = 2
self.K_2 = 1
self.K_3 = 3
self.K_4 = 2
# Pooling
self.KP_1 = 1
self.KP_2 = 1
self.KP_3 = 1
self.KP_4 = 2
# Padding
self.P_1 = 0
self.P_2 = 0
self.P_3 = 0
self.P_4 = 0
self.conv_1_out = m.floor(((reshape + 2*self.P_1 - self.K_1)/self.KP_1)+1)
self.conv_2_out = m.floor(((self.conv_1_out + 2*self.P_2 - self.K_2)/self.KP_2)+1)
self.conv_3_out = m.floor(((self.conv_2_out + 2*self.P_3 - self.K_3)/self.KP_3)+1)
self.conv_4_out = m.floor(((self.conv_3_out + 2*self.P_4 - self.K_4)/self.KP_4)+1)
self.conv_linear_out = int(m.floor((self.conv_4_out)*self.O_4))
self.FN_1 = 50
super(CNN, self).__init__()
self.conv1 = nn.Sequential(nn.Conv1d(1, self.O_1, self.K_1), nn.ReLU(),
nn.MaxPool1d(self.KP_1))
self.conv2 = nn.Sequential(nn.Conv1d(self.O_1, self.O_2, self.K_2), nn.ReLU(),
nn.MaxPool1d(self.KP_2))
self.conv3 = nn.Sequential(nn.Conv1d(self.O_2, self.O_3, self.K_3), nn.ReLU(),
nn.MaxPool1d(self.KP_3))
self.conv4 = nn.Sequential(nn.Conv1d(self.O_3, self.O_4, self.K_4), nn.ReLU(),
nn.MaxPool1d(self.KP_4))
self.fc1 = nn.Linear(self.conv_linear_out, self.FN_1, nn.Dropout(0.2))
self.fc2 = nn.Linear(self.FN_1, 2)
def forward(self, x):
x = x.float()
x = F.leaky_relu(self.conv1(x))
x = F.leaky_relu(self.conv2(x))
x = F.leaky_relu(self.conv3(x))
x = F.leaky_relu(self.conv4(x))
x = x.view(len(x), -1) # SHOULD THE -1 BE THE FIRST VALUE?
x = F.logsigmoid(self.fc1(x))
x = self.fc2(x)
return x
Finally, I train the CNN:
'''
If gpu is available we will use it
'''
use_cuda = True
'''
Train
'''
best_accuracy = -float('Inf')
best_params = []
batch_size = 5
trainloader = torch.utils.data.DataLoader(
train_set, batch_size=batch_size, shuffle=True, num_workers=2)
vldloader = torch.utils.data.DataLoader(
val_set, batch_size=batch_size, shuffle=True, num_workers=2)
testloader = torch.utils.data.DataLoader(
test_set, batch_size=batch_size, shuffle=True, num_workers=2)
lr = 0.001
epochs = 250
momentum = 0.7557312793639288
net = CNN()
if use_cuda and torch.cuda.is_available():
net.cuda()
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=lr, momentum=momentum)
for epoch in range(250): # loop over the dataset multiple times
running_loss = 0.0
for i, data in enumerate(trainloader, 0):
inputs, labels = data
if use_cuda and torch.cuda.is_available():
inputs = inputs.cuda()
labels = labels.cuda()
optimizer.zero_grad()
outputs = net(inputs)
outputs = outputs.to(dtype=torch.float64)
labels = labels.to(dtype=torch.long)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
print('Finished epoch number ' + str(epoch))
correct = 0
total = 0
with torch.no_grad():
for data in vldloader:
inputs, labels = data
inputs = inputs.cuda()
labels = labels.cuda()
outputs = net(inputs)
_, predicted = torch.max(outputs.data, 1)
total += len(labels)
correct += (predicted == labels).sum().item()
print('Accuracy of the network on the validation set: %d %%'
% (100 * correct / total))
Does the structure of my CNN make sense? Should I alter the dimensions of my input tensors? Am I using the view function correctly or can I alter my code to something that makes more sense?