I am doing a classification task on 300k x 24 inputs and correspondingly have 300k x 25 outputs. I get error message of "Killed". Following this question I found that OutOfMemory was raised. I am running the simulation on Amazon c4.large instance. I was hoping that removal of environment variables such as labels, vals would solve the problem but it didn't. Any thoughts on how to bypass the problem? Code I am running can be found below:
numLabels <- 25
numInput <- 24
newThreshold <- 10000
# 25th col of vals represents the data quality
# Will not be used in training
randperm <- sample(length(vals$V1))
train <- cbind(vals, labels)
train <- train[randperm,]
rm(list=c('labels', 'vals'))
f <- as.formula(X1 + X2 + X3 + X4 + X5 + X6 + X7 + X8 + X9 + X10 + X11 + X12 + X13 + X14 + X15 + X16 + X17 + X18 + X19 + X20 + X21 + X22 + X23 + X24 + X25 ~ V1 + V2 + V3 + V4 + V5 + V6 + V7 + V8 + V9 + V10 + V11 + V12 + V13 + V14 + V15 + V16 + V17 + V18 + V19 + V20 + V21 + V22 + V23 + V24, env = train)
nn <- neuralnet(formula = f, threshold = newThreshold, data=train, hidden = c(100), linear.output=FALSE, err.fct='ce', act.fct='logistic', lifesign = 'full', lifesign.step = 100, stepmax=10000, rep = 2)
prediction <- compute(nn, train[,1:numInput])$net.result > 0.5
print(mean(prediction == train[,numInput+1:ncol(train)]))