0

It's difficult for me to create a reproducible example of this as the issue only seems to show as the size of the data frame goes up to too large to paste here. I hope someone will bear with me and help here. I'm sure I'm doing something stupid but reading the help and searching is failing (perhaps on the "stupid" issue.)

I have a data frame of 2,319 rows and three variables: clientID, month and nSlots where clientID is character, month is 1:12 and nSlots is 1:2.

> head(tmpDF2)
   month clientID2 nSlots
21     1         8      1
30     2         8      1
31     4         8      1
28     5         8      1
25     6         8      1
24     7         8      1

Here's table(tmpDF2$nSlots)

> table(tmpDF2$nSlots, useNA = "always")

   1    2 <NA> 
1844   15    0 

I'm trying to use ggplot and geom_tile to plot the attendance of clients and I expect two colours for the tiles depending on the two values of nSlots but when the size of the data frame goes up, I am getting a third colour. Here is is the plot.

enter image description here

OK. Well I gather you can't see that so perhaps I should stop here! Aha, or maybe you can click through to that link. I hope so!

Here's the code then for what it's worth.

ggplot(dat=tmpDF2,
       aes(x=month,y=clientID2,fill=nSlots)) +
  geom_tile() +
  # geom_text(aes(label=nSlots)) +
  theme(panel.background = element_blank()) +
  theme(axis.text.x=element_text(angle=90,hjust=1)) +
  theme(axis.text.y=element_blank(),
        axis.ticks.y=element_blank(),
        axis.line=element_line()) +
  ylab("clients")

The bizarre thing (to me) is that when I keep the number of rows small, the plot seems to work fine but as the number goes up, there's a point, and I've failed utterly to find if one row in the data or value of nrow(tmpDF2) triggers it, when this third colour, a paler value than the one in the legend, appears.

TIA,

Chris

markus
  • 25,843
  • 5
  • 39
  • 58
cpsyctc
  • 41
  • 5
  • 1
    Welcome to SO! If the data values don't matter for your issue, try pasting the code to build the example data frame with simulation functions. like `data.frame(a = runif(100), b = sample(LETTERS, 100, T)`, instead for an easier [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). – Nate Dec 02 '18 at 12:42
  • 1
    Are you sure a 3rd color is appearing? When I glance at the image, I see the light pink color, but upon zooming in I see that it's just dark/white/dark lines very close to each other causing the effect. I think you need to reconsider your approach to visualizing this data. I don't believe it is an actual bug in the code. – Jared C Dec 02 '18 at 13:20
  • If you only want these two colours why don't you set nSlots as factor and assign colours to it via scale_fill_manual()? I also agree with @JaredC - maybe reconsider you visualisation approach. – Alex Dec 02 '18 at 13:25
  • Thanks all three. I tried small e.g.s and couldn't reconstruct it. I had nSlots as a factor and the result was the same. I really don't think it's an optical illusion, I think that's the not great screen grab. Anyway, in the end I dumped geom_tile and used geom_line and got exactly what I wanted so I think we can close this unless someone recognises it as a problem they've seen too. – cpsyctc Dec 03 '18 at 14:03

0 Answers0