I am working on principal components regression. My dataset consists of 150 variables and 60 observations. I am aware that I should have more observations than variables. I used PCA on my dataset. I have received 9 factors through PCA. Certain factors include variables with positive and negative loadings. After that I did a multiple regression with the factor scores and the dependent variable. Positive and negative regression coefficients also emerged there. My question is, how do I implement the factor loadings and regression coefficients in their combinations of positive and negative?
For example: factor 1 has regression coefficient -0.17, with var1 factor loading 0.4, var3 factor loading -0.3 and var7 factor loading -0.22. Factor 2 has regression coefficient 0.28, with var2 factor loading -0.21, var 3 factorloading 0.4 and var6 factor loading -0.3.
My goal is to create groups of my 150 variables, to give these groups a name and to be able to explain which groups cause a higher or lower value of y. I want to know if the variables from those groups then increase or decrease. So far I have standardized my x variables. I tested how many factors I had to use with parallels analysis and PCA applied with the following code:
nipals (xVars, a = 9)
scores <- (nipals (xVars, a = 9) $ T)
loadings <- (nipals (xVars, a = 9) $ P)
With the factor scores I apply regression analysis, where x1 to 9 are the scores of my factors. fit <- lm (y ~ x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8 + x9)
. The summary of my model gives the coefficients. How can I implement these coefficients with corresponding factor loadings?
Data:
y x1 x2 x3 x4 x5
-1,392 0,033 4,471 0,038 0,148 2,208
2,740 0,066 52,836 0,041 0,526 0,186
-0,066 0,219 10,559 0,132 0,488 0,230
Factor loadings:
F1 F2 F3 F4
1 0,10 0,07 0,16 0,08
2 0,05 -0,03 -0,01 -0,22
3 0,14 0,06 0,05 0,01
4 0,12 -0,08 -0,01 -0,03
5 0,12 -0,12 -0,03 0,07