I am trying to use pls
package to analyse my data in R.
My data is similar to gasoline data, my data contains many columns of UV data (at different wavelengths) and one column of alum data. gasoline data contains a numeric vector (octane) and matrix with 401 columns (NIR). It seems NIR data is treated as a group.
I want to formate my data just like gasoline data and use the similar codes as below.
library(pls)
data("gasoline")
gas1 <- plsr(octane ~ NIR, ncomp = 10, data = gasTrain, validation = "LOO")
A small set of my data as follows:
I have tried
library(readxl)
Data <- read_excel("test.xlsx")
x = as.matrix(Data[,1:6])
y = Data[,7]
df1 <- data.frame(x,y)
but it did not form a dataframe as the gasoline data.
Please help me to format a data format like gasoline data, so I can use the pls code to process my data and use UV data to predict alum. Any suggestion is welcome. Many thanks. :)
gasoline data is obtained from the pls
package in R.
I used dput()
function to show my data as below.
dput(head(Data))
structure(list(`UV. 200 nm` = c(35.0310061349693, 34.5507472222222,
34.3612970711297, 33.942698457223, 33.7440041666667, 33.5717955493741
), `UV. 222.5 nm` = c(34.3149110429448, 33.8141833333333, 33.6073877266388,
33.181190743338, 32.9606347222222, 32.7796870653686), `UV. 225 nm` = c(33.4781748466258,
32.9576319444444, 32.7334881450488, 32.2993730715287, 32.0620333333333,
31.870173852573), `UV. 227.5 nm` = c(32.7270429447853, 32.1803916666667,
31.9470181311018, 31.5060967741936, 31.2553597222222, 31.0520792767733
), `UV. 230 nm` = c(32.0851104294479, 31.5236361111111, 31.2877782426778,
30.8468849929874, 30.586125, 30.3832002781641), `UV. 232.5 nm` = c(31.1708558282209,
30.6077847222222, 30.3719414225941, 29.9375497896213, 29.6742291666667,
29.4762865090403), Alum = c(76.000324025669, 75.95384102484,
75.9992186218653, 75.9955211469609, 75.9996022222152, 76.0093745773557
)), row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"
))