1

I'm trying to PCA some columns in larger data set which contains NAs. When I remove the NAs it produces a mismatch in the number of items so I cannot use the data set for Label info. How do I fix this?

> ef <- sepData[c(4, 5, 6, 7, 8, 9, 10)]
> autoplot(prcomp(na.omit(ef)), data = sepData, colour = 'species', label = TRUE, label.size = 3)
Error in data.frame(..., check.names = FALSE) : 
arguments imply differing number of rows: 27, 24

sepData contains the sample names on each row. When I remove omit NAs I lose the order for some columns.

user974887
  • 2,309
  • 3
  • 17
  • 18

1 Answers1

0

One very quick and easy way to get around this is by replacing missing values with column (variable) medians. The zoo package has a nice function na.aggregate() for this. Let's say your matrix is called mat.

library(zoo)
na.aggregate(mat, FUN = median)

Of course, you should really find out why your values are missing in order to impute them most appropriately, especially if there are many of them.

Joe
  • 8,073
  • 1
  • 52
  • 58