3

I am trying to do PCA analysis using princomp function in R.

The following is the example code:

mydf <- data.frame (
    A = c("NA", rnorm(10, 4, 5)), 
    B = c("NA", rnorm(9, 4, 5), "NA"),
    C =  c("NA", "NA", rnorm(8, 4, 5), "NA")
)

out <- princomp(mydf, cor = TRUE, na.action=na.exclude)

Error in cov.wt(z) : 'x' must contain finite values only

I tried to remove the NA from the dataset, but it does not work.

ndnew <- mydf[complete.cases(mydf),]

                   A                  B                C
1                  NA                 NA               NA
2    1.67558617743171   1.28714736288378               NA
3   -1.03388645096478    9.8370942023751 10.9522215389562
4    7.10494481721949   14.7686678743866 4.06560213642725
5     13.966212462717   3.92061729913733 7.12875100279949
6   -1.91566982754146  0.842774330179978 5.26042516598668
7  0.0974919570675357    5.5264365812476 6.30783046905425
8    12.7384749395121   4.72439301946042  2.9318845479507
9    13.1859349108349 -0.546676530952666 9.98938028956806
10   4.97278207223239   6.95942086859593 5.15901566720956
11  -4.10115142119221                 NA               NA

Even if I can remove the NA's it might not be of help as every rows or column has at least one missing values. Is there any R method that can impute the data doing PCA analysis?


UPDATE: based on the answers:

> mydf <- data.frame (A = c(NA, rnorm(10, 4, 5)), B = c(NA, rnorm(9, 4, 5), NA),
+  C =  c(NA, NA, rnorm(8, 4, 5), NA))
> out <- princomp(mydf, cor = TRUE, na.action=na.exclude)
Error in cov.wt(z) : 'x' must contain finite values only

ndnew <- mydf[complete.cases(mydf),]
out <- princomp(ndnew, cor = TRUE, na.action=na.exclude)

This works but the defult na.action does not work.

Is there is any method that can impute the data, as in real data I have almost every column with missing value in them? The result of such NA omission will give me ~ 0 rows or columns.

John Paul
  • 12,196
  • 6
  • 55
  • 75
jon
  • 11,186
  • 19
  • 80
  • 132
  • 2
    My answer below addresses your 'little' question about how to get the `na.action` argument to work. For your big question, about how to proceed when your data contain many NA's, a quick google search on "missing values pca" turns up a ton of useful hits, including [this R function]{http://rss.acs.unt.edu/Rdoc/library/pcaMethods/html/bpca.html}. If you still need help after doing some research, I'd head over to http://stats.stackexchange.com/ , since this is really a statistical question. – Josh O'Brien Apr 30 '12 at 16:37
  • @JoshO'Brien Thanks Josh, I appreciate your help..I was in fog on this issue ...now I got clear path – jon Apr 30 '12 at 16:54

3 Answers3

9

It's because you used character version of NA which really isn't NA.

This demonstrates what I mean:

is.na("NA")
is.na(NA)

I'd fix it at the creation level but here's a way to retro fix it (because you used the character "NA" it makes the whole column of the class character meaning you'll have to fix that with as.numeric as well):

FUN <- function(x) as.numeric(ifelse(x=="NA", NA, x))
mydf2 <- data.frame(apply(mydf, 2, FUN))
ndnew <- mydf[complete.cases(mydf2),]
ndnew

which yields:

                    A                 B                 C
3    11.3349957691175  6.97143301427903 -2.13578124048775
4    5.69035783905702 -2.44999550936244 -4.40642099309301
5  -0.865878644072023  6.03782080227184  9.83402859382248
6    6.58329959845638  5.67811450593805  12.4477770011262
7   0.759928613563254  16.6445809805028  9.45835418422973
8    11.3798459951171  1.36989010500538 0.784492783538675
9   0.671542080233918   5.9024564388189  16.2389092991422
10   3.64295741533713  9.78754135462621  -2.4293697924212

EDIT:==========================================================

"this works but the defult na.action do not work"

Don't know much about princomp but this works (not sure why the function's na.action doesn't):

out <- princomp(na.omit(mydf), cor = TRUE)

"Is there is any method that can impute the data, as in real data I have almost every column with missing value in them ? result of such na omit will give me ~ 0 rows or columns"

This really is a separate question from your first and you should start a new thread after researching the topic on your own a little bit.

Tyler Rinker
  • 108,132
  • 65
  • 322
  • 519
  • thanks, that is definately helpful, I think I replaced blank spaces with NA while exporting from excel created csv. That might have changed NA to "NA". Please see my updated question – jon Apr 30 '12 at 16:05
8

For na.action to have an effect, you need to explicitly supply a formula argument:

princomp(formula = ~., data = mydf, cor = TRUE, na.action=na.exclude)

# Call:
# princomp(formula = ~., data = mydf, na.action = na.exclude, cor = TRUE)
# 
# Standard deviations:
#    Comp.1    Comp.2    Comp.3 
# 1.3748310 0.8887105 0.5657149 

The formula is needed because it triggers dispatch of princomp.formula, the only princomp method that does anything useful with na.action.

methods('princomp')
[1] princomp.default* princomp.formula*

names(formals(stats:::princomp.formula))
[1] "formula"   "data"      "subset"    "na.action" "..."  

names(formals(stats:::princomp.default))
[1] "x"      "cor"    "scores" "covmat" "subset" "..."   
Josh O'Brien
  • 159,210
  • 26
  • 366
  • 455
0

The nipals library will perform PCA with missing values and provide fitted values.

set.seed(1)
mydf <- data.frame (
    A = c(NA, rnorm(10, 4, 5)), 
    B = c(NA, rnorm(9, 4, 5), NA),
    C =  c(NA, NA, rnorm(8, 4, 5), NA)
)

# Remove rows with all missing values
mydf <- mydf[ !apply(mydf, 1, function(x) all(is.na(x))), ]
mydf

library(nipals)
res <- nipals(mydf, fitted=TRUE)

# Look at fitted values
res$fitted

# Compare fitted and observed values
res$fitted-mydf

               A             B            C
2   0.0062853910  0.0253433878           NA
3  -0.0005800986  0.0015428998  0.001829560
4   0.0046210396 -0.0019671275 -0.007074557
5   0.0062666341  0.0083711959 -0.001574603
6  -0.0034899784  0.0007386345  0.004800290
7   0.0018738600 -0.0097446464 -0.009368384
8   0.0003539155  0.0029634392  0.001720441
9  -0.0035414103  0.0021827218  0.005912196
10 -0.0028836774  0.0012138259  0.004404780
11  0.0001702055            NA           NA
Kevin Wright
  • 2,397
  • 22
  • 29