-3

So in R I'm using principal component analysis for a csv file with variables each representing a year from 1 to 32.

Here is the code:

Xdata <- context2[2:ncol(context2)]
head(Xdata)
model5 <- prcomp(Xdata)
model5$rotation[,1]*100
screeplot(model5,type="lines") 
factor <- model5$x[,1]
context2$factor <- factor
factor2 = matrix(c(factor), nrow=651, ncol=1)
factor <- factor2 %*% solve(sqrtm(crossprod(factor2))) * sqrt(nrow(factor2)) 
crossprod(factor)/nrow(factor)

All of that code above works fine, but I need to find the year values where the standardized factor is less than -2.58. If I view the standardized factor in r I can get values less than -2.58 by just looking, but the numbers are outputs of the principal component 1 variable like this: https://i.stack.imgur.com/HVxNC.jpg. How do I go about getting the years where the standardized factor is less than -2.58?

user2554330
  • 37,248
  • 4
  • 43
  • 90
cguitarw
  • 5
  • 3
  • 1
    I don't understand your question. Could you please add a [minimal reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) including sample data, and the expected output? Is this about `subset`ting data based on the values from a column? It's difficult to give specific advice without any sample data/code. – Maurits Evers Dec 06 '17 at 00:42
  • I need to subset the data for factor which only has 1 variable which is PC1, but I need to subset it for years where the value of the standardized factor is less than -2.58. The standardized factor does not show year values though. – cguitarw Dec 06 '17 at 00:46
  • Still not clear without any data or reproducible code. If I understand you correctly you want to subset your source data depending on the values of the (standardised) PCA-based *loadings* of the observations? We need (part of) your source data. Best to edit your original question and add more details. – Maurits Evers Dec 06 '17 at 00:51

1 Answers1

0

It is not clear which is the that contains the standardized factor, nor which variable contains "years". I am guessing the variable is:

crossprod(factor)/nrow(factor)

If the year variable and the variable I mentioned above are in the same order (that is, the year in the nth position corresponds to the factor in the same position), then you could use base R to get the year. For example:

factor_variable <- crossprod(factor)/nrow(factor)
year[which(factor_variable < -2.58)]
Agarp
  • 433
  • 7
  • 15