3

I'm running a principal component analysis on a dataset with more than 1000 variables. I'm using R Studio and when I run the summary to see the cumulative variance of the components, I can only see the last few hundred components. How do I limit the summary to only show, say, the first 100 components?

joran
  • 169,992
  • 32
  • 429
  • 468
user1209675
  • 296
  • 7
  • 18
  • Can you provide a small reproducible example ? – digEmAll Apr 07 '12 at 15:30
  • @digemall Not really, the dataset is huge. I'm just running: prin <- princomp(train[c(2:1777)]) summary(prin) When I do that, it shows the info for all 1776 principal components. I only need the first 100 or so. – user1209675 Apr 07 '12 at 16:07
  • Yes, of course not the full code. I meant a litte example to understand exactly your steps. Anyway @joran got the point ;) – digEmAll Apr 07 '12 at 16:44

3 Answers3

2

I tried this and it seems to be working: l = loadings(prin) l[,1:100]

wj4f
  • 21
  • 2
1

It's pretty easy to modify print.summary.princomp (you can see the original code by typing stats:::print.summary.princomp) to do this:

pcaPrint <- function (x, digits = 3, loadings = x$print.loadings, cutoff = x$cutoff,n, ...) 
{
    #Check for sensible value of n; default to full output
    if (missing(n) || n > length(x$sdev) || n < 1){n <- length(x$sdev)}
    vars <- x$sdev^2
    vars <- vars/sum(vars)
    cat("Importance of components:\n")
    print(rbind(`Standard deviation` = x$sdev[1:n], `Proportion of Variance` = vars[1:n], 
        `Cumulative Proportion` = cumsum(vars)[1:n]))
    if (loadings) {
        cat("\nLoadings:\n")
        cx <- format(round(x$loadings, digits = digits))
        cx[abs(x$loadings) < cutoff] <- paste(rep(" ", nchar(cx[1, 
            1], type = "w")), collapse = "")
        print(cx[,1:n], quote = FALSE, ...)
    }
    invisible(x)
}

pcaPrint(summary(princomp(USArrests, cor=TRUE),
              loadings = TRUE, cutoff = 0.2), digits = 2,n = 2)

Edited To include a basic check for a sensible value for n. Now that I've done this, I wonder if it isn't worth suggesting to R Core as a permanent addition; seems simple and like it might be useful.

joran
  • 169,992
  • 32
  • 429
  • 468
0

You can put the loadings in matrix form, you could save the matrix to a variable and then subset (a la matrix[,1:100]) it to see the first/middle/last n. In this example, I've used head(). Each column is a principle component.

head(
  matrix(
    prin$loadings, 
      ncol=length(dimnames(prin$loadings)[[2]]),
      nrow=length(dimnames(prin$loadings)[[1]])
  ),
100)
Brandon Bertelsen
  • 43,807
  • 34
  • 160
  • 255