-2

Is there an easy way to do a categorical scatter plot with base R?

I would like the X-axis labels to be the three column labels below. Thanks!

SW-SE        North-SW    North-SE
0.0322791   0.0466558   0.05533
0.0300673   0.0503937   0.0590444
0.0302151   0.0562131   0.0612469
0.0242698   0.068037    0.0756064
0.0315696   0.0440456   0.0449465
0.0273471   0.0485332   0.048216
0.0249796   0.055911    0.0529762
0.0219699   0.0663013   0.0651523
0.0173046   0.0467941   0.049092
0.0224143   0.0507807   0.0526732
0.0245645   0.0554949   0.0567835
0.020624    0.0691155   0.0705431
0.0208465   0.0340491   0.0525786
0.0160655   0.0382029   0.0561054
0.0236193   0.0441057   0.0597504
0.0280541   0.0561134   0.0741485
0.0242048   0.0420126   
0.0243629   0.0459014   
0.0192736   0.0476303   
0.0268329   0.0620177   
Dave2e
  • 22,192
  • 18
  • 42
  • 50
AGE
  • 169
  • 6
  • Welcome to stackoverflow. Unfortunately, it's not very clear what you're asking. Please consider reading up on http://stackoverflow.com/questions/how-to-ask and http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – lbusett Feb 25 '17 at 22:01
  • make two columns, one for the category and one for the quantity and make and object out of them: pairwise = fst$site_pair Fst = fst$fst ##Then this: p <- ggplot(fst, aes(factor(pairwise), Fst)) p + geom_boxplot() + geom_jitter() + theme_bw() + scale_y_continuous(breaks = c(0, 0.01,0.02,0.03,0.04,0.05,0.06,0.07,0.08)) + labs(x= "Transects", y = expression(F[ST])) + theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(), panel.background = element_blank(), axis.line = element_line(colour = "black")) – AGE Feb 26 '17 at 01:07

1 Answers1

1

I think you are asking for something like a jittered scatter plot. I personally only like these with a boxplot behind them so here it is:

df = read.table(text =
  "SW-SE        North-SW    North-SE
  0.0322791   0.0466558   0.05533
  0.0300673   0.0503937   0.0590444
  0.0302151   0.0562131   0.0612469
  0.0242698   0.068037    0.0756064
  0.0315696   0.0440456   0.0449465
  0.0273471   0.0485332   0.048216
  0.0249796   0.055911    0.0529762
  0.0219699   0.0663013   0.0651523
  0.0173046   0.0467941   0.049092
  0.0224143   0.0507807   0.0526732
  0.0245645   0.0554949   0.0567835
  0.020624    0.0691155   0.0705431
  0.0208465   0.0340491   0.0525786
  0.0160655   0.0382029   0.0561054
  0.0236193   0.0441057   0.0597504
  0.0280541   0.0561134   0.0741485
  0.0242048   0.0420126   NA
  0.0243629   0.0459014   NA
  0.0192736   0.0476303   NA
  0.0268329   0.0620177   NA",
  header = TRUE
)

jitter = 0.1
boxplot(df, at = c(1, 2, 3))
points(runif(20)*jitter - jitter/2 + 1, df$SW.SE)
points(runif(20)*jitter - jitter/2 + 2, df$North.SW)
points(runif(20)*jitter - jitter/2 + 3, df$North.SE)

Effectively boxplot sets everything up, you can control the positions with the at argument which also makes it easy to line up the jittered scatter later with points which is just like plot but it adds to the existing figure.

jittered boxplot

The reason the jitter is important is so that if you have multiple (near) identical y values they are separated slightly by the jitter and you can tell them apart more easily.

The reason I prefer the boxplot behind the jitter is so I can see the distribution more easily than mentally looking for the median and quartiles.

vincentmajor
  • 1,076
  • 12
  • 20
  • What if I wanted to add three more columns, North, SW, and SE? boxplot(df, at = c(1, 2, 3, 4, 5, 6)) points(runif(20)*jitter - jitter/2 + 1, df$SE) points(runif(20)*jitter - jitter/2 + 2, df$SW) points(runif(20)*jitter - jitter/2 + 3, df$North) points(runif(20)*jitter - jitter/2 + 4, df$`North-SW`) points(runif(20)*jitter - jitter/2 + 5, df$`North-SE`) points(runif(20)*jitter - jitter/2 + 6, df$`SW-SE`) Gives me this: Error in xy.coords(x, y) : 'x' and 'y' lengths differ – AGE Feb 26 '17 at 02:47
  • I don't know what your new `df` looks like but that should work provided `df$SW` has the same length as the others. I hard coded `points(runif(20)...` but it should be `points(runif(nrow(df))...`. The error you are seeing looks like `points()` is receiving two arguments of different lengths. – vincentmajor Feb 26 '17 at 19:53