I'm running a principal component analysis on a dataset with more than 1000 variables. I'm using R Studio and when I run the summary to see the cumulative variance of the components, I can only see the last few hundred components. How do I limit the summary to only show, say, the first 100 components?
Asked
Active
Viewed 1,973 times
3
-
Can you provide a small reproducible example ? – digEmAll Apr 07 '12 at 15:30
-
@digemall Not really, the dataset is huge. I'm just running: prin <- princomp(train[c(2:1777)]) summary(prin) When I do that, it shows the info for all 1776 principal components. I only need the first 100 or so. – user1209675 Apr 07 '12 at 16:07
-
Yes, of course not the full code. I meant a litte example to understand exactly your steps. Anyway @joran got the point ;) – digEmAll Apr 07 '12 at 16:44
3 Answers
1
It's pretty easy to modify print.summary.princomp
(you can see the original code by typing stats:::print.summary.princomp
) to do this:
pcaPrint <- function (x, digits = 3, loadings = x$print.loadings, cutoff = x$cutoff,n, ...)
{
#Check for sensible value of n; default to full output
if (missing(n) || n > length(x$sdev) || n < 1){n <- length(x$sdev)}
vars <- x$sdev^2
vars <- vars/sum(vars)
cat("Importance of components:\n")
print(rbind(`Standard deviation` = x$sdev[1:n], `Proportion of Variance` = vars[1:n],
`Cumulative Proportion` = cumsum(vars)[1:n]))
if (loadings) {
cat("\nLoadings:\n")
cx <- format(round(x$loadings, digits = digits))
cx[abs(x$loadings) < cutoff] <- paste(rep(" ", nchar(cx[1,
1], type = "w")), collapse = "")
print(cx[,1:n], quote = FALSE, ...)
}
invisible(x)
}
pcaPrint(summary(princomp(USArrests, cor=TRUE),
loadings = TRUE, cutoff = 0.2), digits = 2,n = 2)
Edited To include a basic check for a sensible value for n
. Now that I've done this, I wonder if it isn't worth suggesting to R Core as a permanent addition; seems simple and like it might be useful.

joran
- 169,992
- 32
- 429
- 468
-
Thank you so much. Exactly what I needed. This makes datamining applications so much easier. – user1209675 Apr 07 '12 at 16:41
-
@joran: yes it's a feature that is worth to submit to R-Core team IMO. – digEmAll Apr 07 '12 at 16:45
0
You can put the loadings in matrix form, you could save the matrix to a variable and then subset (a la matrix[,1:100]
) it to see the first/middle/last n. In this example, I've used head(). Each column is a principle component.
head(
matrix(
prin$loadings,
ncol=length(dimnames(prin$loadings)[[2]]),
nrow=length(dimnames(prin$loadings)[[1]])
),
100)

Brandon Bertelsen
- 43,807
- 34
- 160
- 255