8

UPDATE: I found the answer... included it below.

I have a dataset that contains the following variables and similar values:

COBSDATE,   CITY, RESPONSE_TIME
2011-11-23  A     1.1
2011-11-23  A     1.5
2011-11-23  A     1.2
2011-11-23  B     2.3
2011-11-23  B     2.1
2011-11-23  B     1.8
2011-11-23  C     1.4
2011-11-23  C     6.1
2011-11-23  A     3.1
2011-11-23  A     1.1

I have successfully created a graph that displays all of the response_time values and a smooth geometry to further describe some of the variation.

The challenge that I have is that I want a better view of the smoothed value, and one of the cities has frequent 'outliers'. I can control this by adding ylim(0,p99) to the plot, but this then causes the smooth to only be calculated on the subset of data.

Is there a way to use all of this data for the smoothed plot and the only the subset for the jitter plot?

My code here (both are the same except for the + ylim(0,20): truncated -

ggplot(dataRaw, aes(x=COBSDATE, y=RESPONSE_TIME)) + 
    geom_jitter(colour=alpha("#007DB1", 1/8)) + 
    geom_smooth(colour="gray30", fill=alpha("gray40",0.5)) + 
    ylim(0,20) + 
    facet_wrap(~CITY)

Whole data set -

ggplot(dataRaw, aes(x=COBSDATE, y=RESPONSE_TIME)) + 
    geom_jitter(colour=alpha("#007DB1", 1/8)) + 
    geom_smooth(colour="gray30", fill=alpha("gray40",0.5)) + 
    facet_wrap(~CITY)
shridatt
  • 896
  • 4
  • 15
  • 39
BenH
  • 167
  • 3
  • 10
  • Can you use `dput` to give us a subset of the data so we can plot this out? – Maiasaura Feb 29 '12 at 19:33
  • 2
    Please don't put an answer in your question. There's a designated spot for answers just below! Put it as an answer and then after the waiting period is over, accept it! Answering your own question is perfectly ok here. – joran Feb 29 '12 at 20:02

2 Answers2

12

If you just want to "zoom in", you can use coord_cartesian:

ggplot(dataRaw, aes(x=COBSDATE, y=RESPONSE_TIME)) + 
  geom_jitter(colour=alpha("#007DB1", 1/8)) + 
  geom_smooth(colour="gray30", fill=alpha("gray40",0.5)) + 
  coord_cartesian(ylim=c(0,20)) + 
  facet_wrap(~CITY)

If you want to use a subset of the data for the jitter geom, then override the data inheritance:

ggplot(dataRaw, aes(x=COBSDATE, y=RESPONSE_TIME)) + 
  geom_jitter(data=subset(dataRaw, RESPONSE_TIME>=0 & RESPONSE_TIME<=20), 
              colour=alpha("#007DB1", 1/8)) + 
  geom_smooth(colour="gray30", fill=alpha("gray40",0.5)) + 
  ylim(0,20) + 
  facet_wrap(~CITY)
Brian Diggs
  • 57,757
  • 13
  • 166
  • 188
Dan M.
  • 1,526
  • 1
  • 12
  • 17
5

UPDATED ANSWER:So, I was looking for something completely different and stumbled upon the answer I needed.

Instead of ylim(0,yMax) One should use coord_cartesian(ylim = c(0, yMax))

It appears that coord_cartesian simply "zooms" the graph instead of truncating the data included.

shridatt
  • 896
  • 4
  • 15
  • 39