So I need to do Principle Component Regression with cross validation and I could not find a package in Python that would do so. I wrote my own PCR class but when tested against R's pls package it performs significantly worse and is much slower on high dimensional data (~50000 features) which I am still not sure why but that is another question. Because all of my other code is in python, and in the interest of saving time I decided the best way might just be able to write an R function that utilizes the PLS package in R. Here is the function:
R_pls <-function(X_train,y_train,X_test){
library(pls)
X<-as.matrix(X_train)
y<-as.matrix(y_train)
tdata<-data.frame(y,X=I(X))
REGmodel <- pcr(y~X,scale=FALSE,data=tdata,validation="CV")
B<-RMSEP(REGmodel)
C<-B[[1]]
q<-length(C)
degs<-c(1:q)
allvals<-C[degs%%2==0]
allvals<-allvals[-1]
comps<-which.min(allvals)
xt<-as.matrix(X_test)
ndata<-data.frame(X=I(xt))
ypred_test<-as.data.frame(predict(REGmodel,ncomp=comps,newdata=ndata,se.fit=TRUE))
ntdata<-data.frame(X=I(X))
ypred_train<-as.data.frame(predict(REGmodel,ncomp=comps,newdata=ntdata,se.fit=TRUE))
data_out=list(ypred_test=ypred_test,ypred_train=ypred_train)
return(data_)
}
So I have found a good amount of information on how to access R built in functions but cannot really find anything for this situation. So I tied the following:
import rpy2.robjects as ro
prs=ro('R_pls')
where R_pls is the R function above. This produces
TypeError: 'module' object is not callable.
Any idea how I might get this to work or I am open to suggestions if there might be a better method.
Thanks