1

My story: I need to plot a 2-D dimensional plot with the points colored by a third variable which is discrete and valued as integers (20 possible values).

Finding so far: All the code that I find first converts the third variable to factors and then color the points with the factor levels. For instance,

p <- qplot(mpg, wt, data = mtcars, colour = factor(cyl))

p + scale_colour_manual(values = c("red","blue", "green"))

Question: I am confused from here because I am not sure what the actual value (of my original third variable) is corresponding to each level of the factors. Are the values in the legend the actual values or the levels?

Is there another way to do it without converting my variable to a factor variable?

Heroka
  • 12,889
  • 1
  • 28
  • 38
StayLearning
  • 601
  • 2
  • 7
  • 18
  • Welcome to SO. Please produce a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). – Heroka Sep 23 '15 at 07:30
  • Conceptionally a discrete variable and an ordered factor are exactly the same. The only difference is that the factor allows the possible values to have different names than integers. If your colouring variable is stored as a numeric you should convert it to a factor and choose colours from a gradient to illustrate how they are ordered. – Backlin Sep 23 '15 at 07:48

1 Answers1

1

If you look at the output from mtcars, you see that the values of the cyl variable (before converting to a factor) are 4, 6, and 8.

> mtcars
                     mpg cyl  disp  hp drat    wt
Mazda RX4           21.0   6 160.0 110 3.90 2.620
Mazda RX4 Wag       21.0   6 160.0 110 3.90 2.875
Datsun 710          22.8   4 108.0  93 3.85 2.320
Hornet 4 Drive      21.4   6 258.0 110 3.08 3.215
Hornet Sportabout   18.7   8 360.0 175 3.15 3.440
...

When you convert mtcars$cyl variable to a factor it uses the original values as the labels:

> mtcars$cyl <- as.factor(mtcars$cyl)
> str(mtcars$cyl)
 Factor w/ 3 levels "4","6","8": 2 2 1 2 3 2 3 1 1 2 ...

So the graph example you created above is outputting the new factor labels and these correspond directly to your original values. That is to say, it should be safe to convert your discrete variable with 20 levels to a factor and use the new factor variable to colour your graph, the labels will be correct.

If you don't want to change your 20-level discrete variable you can always graph it as a continuous variable but I don't think the resulting legend is the type of legend you're after.

qplot(mpg, wt, data = mtcars, colour = cyl)
tsurudak
  • 602
  • 7
  • 14