I am trying to create a predictive model out of my Principal component analysis that I used on a Dataset called Turbofan Engine Degridation Simulation DataSet, or in this case called "General propulsion"(https://drive.google.com/drive/folders/1WiGafxzYb2Nv0yCNrXqyzbYBbzWHiVEa?usp=sharing). This Dataset contains 20 engines which each a certain amount of Cycles (and other variabels) that the Engine ran untill it broke down. (I'm a student and I do not have that much experience with Rstudio yet so my code can me a little messy)
vapply(Motor_gegevens, function(x) length(unique(x)) > 1, logical(1L))
Motor_gegevens <- Motor_gegevens[vapply(Motor_gegevens, function(x) length(unique(x)) > 1, logical(1L))]
I deleted all the variabels that contained rows with the same value to clean the Dataset and devided the Dataset into a train and a test set by binding Engine 1-5 into a testset and Engine 6-20 into a trainset.
Motor_test <- rbind(Engine1,Engine2,Engine3,Engine4,Engine5)
Motor_train<- rbind(Engine6,Engine7,Engine8,Engine9,Engine10,Engine11,Engine12,Engine13,Engine14,Engine15,Engine16,Engine17,Engine18,Engine19,Engine20)
After I ran a PCA on the trainset and created a plot to check the variance of the components. (98% of the variance can be explained by 15 components)
PCA <- prcomp(Motor_train, scale = T)
PCA
plot(PCA, type= "l")
biplot(PCA, scale = 0)
std_dev <- PCA$sdev
pr_var <- std_dev^2
propvarex<- pr_var/sum(pr_var)
plot(propvarex, xlab = "PC", ylab = "prop of var", type = "b")
plot(cumsum(propvarex), xlab = "pca", ylab = "cum prop van var", type = "b")
I made a rpart model using the traindata where I predicted the Cycles using the PCA.
train.data<- data.frame(Cycle = Motor_train$Cycle, PCA$x)
train.data<- train.data[,1:16]
library(rpart)
rpart.model <- rpart(Cycle ~ ., data = train.data, method = "anova")
rpart.model
Finally I tried to predict the the Cycles of the testset using the rpart model with the current results being not what I hoped for.
test.data<- predict(PCA, newdata = Motor_test)
test.data<- as.data.frame(test.data)
test.data<- test.data[,1:15]
rpart.prediction<- predict(rpart.model, test.data)
head(rpart.prediction)
1 2 3 4 5 6
31.74074 31.74074 31.74074 31.74074 31.74074 31.74074
The method I used did not give me the right results (or the current script that I wrote). The desired result need to provide me with a model that tells me how many Cycles the Engine still can make untill it brakes down. So I'm looking for a way to achieve this. I couldn't find a working methode online or in any stackoverflow question so I try my luck with all of you active Datascientists out here on stack!
Can anyone help me out?
Thanks in advance