0

I am plotting some data using facet_grid(), and I noticed something puzzling.

I anticipate I am a beginner with ggplot libraries and I might have missed something. Anyhow, here it goes.

Assuming the following dataframe:

library(ggplot2)

d1 <- runif(500)
d2 <- runif(500)*10
s1 <- sample(LETTERS[1:2], 500, replace = T, prob=c(0.3, 0.7))
s2 <- sample(letters[3:4], 500, replace = T, prob=c(0.4, 0.6))
df <- data.frame(s1, s2, d1, d2)

which looks like this:

s2 s1 d1        d2
c  B  0.3434944 0.9881925
d  A  0.7847741 9.7759946
d  A  0.3142764 2.3654268
...

I plot the data so that they are sorted according to the categorical values:

ggplot(df, aes(x=df$d1, y=df$d2)) +
geom_point(col="red", cex=2) +
facet_grid(d2 ~ d1)

Resulting in the following plot:

Plot 1

I now want to overplot only a subset of the data, and I used the following (here simplified) code:

geom_point(data=df[df$d2 > 7.5,],
aes(x=df$d1[df$d2 > 7.5], y=df$d2[df$d2 > 7.5]),
cex=1, colour=I("black"))

Resulting into the following plot:

Plot 2

Now, having set a threshold, I expect that all values, say, "bigger than threshold" were plotted onto pre-existing values.

This does not appear to be the case.

In fact, some pre-existing values do not have the matching thresholded value. Also, some thresholded values do not have the matching pre-existing value. What puzzles me most is that, it is my understanding, that the data points come from the same dataframe, and I expect the first layer (the pre-existing ones) to contain the second layer. Am I missing something here?

Also, if looking carefully, the plotted points are matching the right 2D-position, although they are in the wrong quadrant.

Even more puzzling: if I plot the following subsets:

ggplot(df[df$d2 < 7.5,], aes(x=df$d1[df$d2 < 7.5], y=df$d2[df$d2 < 7.5])) +
geom_point(col="red", cex=2) +
facet_grid(d2 ~ d1) +
geom_point(data=df[df$d2 > 7.5,], aes(x=df$d1[df$d2 > 7.5], y=df$d2[df$d2 > 7.5]), cex=1, colour=I("black"))

Some of the pre-existing values move from the region "above threshold" to that "below threshold". Can anybody explain such behaviour?

Thanks a lot.

Elendhur
  • 13
  • 4
  • Your code is not reproducible. What are `selpmas`,`samples`,`ragdoll`,`llodgar`? See here http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example and have another go :) – J.Con Nov 07 '16 at 21:52
  • Thanks for pointing that out. I missed the editing in the most important part. This "go" should be fine. – Elendhur Nov 08 '16 at 11:44

1 Answers1

0

I can't exactly explain the why of your problem, but I think your subsets within the plot function were not recognising the facets. By creating a new T/F column in the dataframe, we can control the colours and size for each individual facet. Is this any good?

EDIT Using hollow points, shape=21 and scale_fill_manual, to exactly address the question.

df$d<-df$d2>7.5

ggplot(data=df, aes(x=d1, y=d2,colour=d,size=d,fill=d))+
    facet_grid(s1~s2)+
    geom_point(show.legend=F,shape=21,size=2,stroke=1.5,col="red")+
    scale_fill_manual(values=setNames(c('black','red'),c(T,F)))

enter image description here

J.Con
  • 4,101
  • 4
  • 36
  • 64
  • Thanks for the reply, J.Con. The example you provided is similar to the last piece of code that I posted, where I split the data points in two subgroups (bigger than or smaller than the threshold). Still, the code I posted is misplacing some data-points in a way I cannot really understand. Anyhow, what I actually need, though, is to plot -say- all the red dots plus the black ones onto the red ones. Funny thing is, if I use `plot()` followed by `par(new=T)`, and then plot only the data points "bigger than" the threshold, it works perfectly. – Elendhur Nov 08 '16 at 23:28
  • I suspect that `facet_grid()` or `ggplot()` are somehow messing up with the subsets, though I cannot really figure out how. – Elendhur Nov 08 '16 at 23:29
  • Thanks again, J.Con. The edited answer does the trick! I realised that defining the threshold as you suggested (directly generating a column in the dataframe) allows proper sorting of the facets with `facet_grid()`. – Elendhur Nov 09 '16 at 15:30