1

I am trying to plot the first two columns against each other of a matrix Y, and assigning different data points different shapes and colors according to which group they belong to in the 12th column of my data set. Below is my code:

 X <- as.matrix(course[,1:11])
 S <- cov(X)
 l <- eigen(S)$values
 e <- eigen(S)$vector
 Y <- X %*% e

plot(Y[,1:2],
     xlab = "PC1",
     ylab = "PC2",
     pch = c(1, 17, 8)[as.numeric(course[,12])],  # different 'pch' types 
     main = "Plot of first 2 Principle Components",
     col = c(1, 8, 1)[as.numeric(course[,12])]
     )

"course" is the data set I'm working with, and Y is the matrix i'm interested in using for my plot. However one of the groups which i'm basing my labeling on is basically missing values or "NA". I can't use as.numeric() since this does not treat "NA" values as numeric.

When i run the code from the plot, I get two set of values, and it completely ignores the ones for NA.

I would really appreciate the help.

1 Answers1

1

You should be able to create a vector of pch values prior to calling plot(). You could do this with ?ifelse, for example. Most likely, it will be convenient to have the category with the NAs as the final else, so that you don't need a complicated matching argument. Store this in a variable (you could call it myPch), and then use that variable in your function call. That is,

# assuming there are 3 courses: "A", "B", & "C", but some C's are NA's
myPch <- ifelse(course[,12]=="A", 1, ifelse(course[,12]=="B", 17, 8) )
plot(..., pch=myPch, ...)  
gung - Reinstate Monica
  • 11,583
  • 7
  • 60
  • 79
  • But pch is a plot argument, how do I go about specifying a vector assigning specific pch's to different values. Also, I tried the if else argument in this manner: course[,12] <- ifelse(is.na(course[,12]),0,course[,12]) but it still doesn't work. I tried relabeling it as "missing" instead of "0" but no luck. I only get two different types of points, whereas I want three. –  Nov 11 '13 at 17:27
  • You haven't really provided a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example), so I'm limited in what I can show you, but I'll try to make up something & we'll see if it's enough to get the idea across. – gung - Reinstate Monica Nov 11 '13 at 17:36
  • course <- read.table("course_happy_nomiss.txt", header = TRUE) levels(course$Year) <- list(MSc_4 = c("MSc", "4"), "3" = c("3")) > course[,12] [1] 3 3 MSc_4 MSc_4 3 MSc_4 MSc_4 3 3 3 3 3 3 3 [16] 3 3 3 MSc_4 MSc_4 MSc_4 3 3 3 3 3 MSc_4 3 MSc_4 [31] 3 3 3 3 3 MSc_4 3 3 3 MSc_4 MSc_4 MSc_4 3 [46] MSc_4 3 3 3 MSc_4 MSc_4 3 3 MSc_4 MSc_4 MSc_4 [61] MSc_4 MSc_4 3 Levels: MSc_4 3 –  Nov 11 '13 at 17:42
  • that's all i got. also i noticed that the ifelse command i included returns numerical values for all my values, despite me specifying it for NA values only –  Nov 11 '13 at 17:45
  • You want the `ifelse` command to output numerical values only. Specifically, you want it to output the numbers for the symbols that you want to use. You should use it like the way I wrote it, just substitute the actual values for "A", "B" & "C". NB, I don't quite follow your output above, do you have 2 different values coding for the same course? – gung - Reinstate Monica Nov 11 '13 at 18:13
  • I think I solved it using this: course[,12] <- ifelse(is.na(course[,12]),3,course[,12]) now the plot code i used above works –  Nov 11 '13 at 18:27