-4

I have a data frame that has dates and cpu utilization data for several months. I can create a smoothed gplot like this:

qplot(Date, CPU, data=app1, geom=c("line", "smooth"), method = "lm", 
  ylab="CPU", xlab="Date", main=")

This does not show the dates, it only shows couple of dates. Is it possible to show dates that are of importance like if the date is greater than or smaller than smoothed line?

again, I am sorry if I am asking too many questions. I am just learning R and going through that first pain.

the data looks like this:

Date  CPU


3/10/2012 0:00  28.7
3/9/2012 0:00   94.1
3/2/2012 0:00   82.7
2/23/2012 0:00  68.5
2/22/2012 0:00  67.4
2/10/2012 0:00  100
2/6/2012 0:00   100
2/4/2012 0:00   89.4974
2/3/2012 0:00   100
2/1/2012 0:00   100
1/29/2012 0:00  57.4693
1/25/2012 0:00  100
1/21/2012 0:00  98.2085
1/20/2012 0:00  99.9987
1/19/2012 0:00  99.9698
1/17/2012 0:00  99.9802
1/15/2012 0:00  51.5781
1/14/2012 0:00  86.5854
1/12/2012 0:00  100
1/10/2012 0:00  100
1/8/2012 0:00   48.3474
1/6/2012 0:00   99.9833
1/5/2012 0:00   100
1/2/2012 0:00   100
12/31/2011 0:00 99.6901
12/25/2011 0:00 76.543
12/21/2011 0:00 99.9536
12/19/2011 0:00 100
12/16/2011 0:00 99.9807
12/14/2011 0:00 99.9995
12/6/2011 0:00  100
3/8/2012 0:00   83.2
3/7/2012 0:00   67.7
3/6/2012 0:00   70.8
3/5/2012 0:00   92.6
2/27/2012 0:00  77.3
2/24/2012 0:00  74.1
2/21/2012 0:00  79.3
2/19/2012 0:00  57.8052
2/18/2012 0:00  99.9938
2/14/2012 0:00  100
2/9/2012 0:00   100
2/8/2012 0:00   100
2/7/2012 0:00   100
2/5/2012 0:00   57.478
2/2/2012 0:00   100
1/31/2012 0:00  100
1/30/2012 0:00  100
1/28/2012 0:00  87.604
1/27/2012 0:00  100
1/24/2012 0:00  100
1/23/2012 0:00  100
1/18/2012 0:00  100
1/16/2012 0:00  99.9477
1/13/2012 0:00  99.9979
1/9/2012 0:00   100
1/7/2012 0:00   92.6704
1/4/2012 0:00   100
1/3/2012 0:00   100
1/1/2012 0:00   17.501
12/28/2011 0:00 100
12/27/2011 0:00 100
12/23/2011 0:00 99.999
12/22/2011 0:00 100
12/20/2011 0:00 99.9865
12/18/2011 0:00 8.2211
12/15/2011 0:00 100
joran
  • 169,992
  • 32
  • 429
  • 468
george willy
  • 1,693
  • 8
  • 22
  • 26
  • 5
    Please read about [how to make a reproducible example](http://stackoverflow.com/q/5963269/324364) before asking questions. Also, you'll get a more favorable response if you show some indication of having made an attempt yourself before asking others to do it for you. – joran Mar 15 '12 at 19:40
  • What do you mean `if the date is greater than or smaller than smoothed line`? – Ben Mar 15 '12 at 20:00
  • I cannot show all the dates in xaxis, not enough space. Maybe I was thinking to show dates when CPU goes and bleow of smooth line – george willy Mar 15 '12 at 20:07
  • 1
    Show sample data -- that is show the output of `dput(app1)` or maybe `dput(head(app1))`. As we said many times over: **reproducible examples** or else the question doesn't really have a bite. – Dirk Eddelbuettel Mar 15 '12 at 21:18
  • 2
    Well it does sound like an interesting question, but @mike smith, you'll need to give a bit more data than that to make your problem reproducible. The three observations you give aren't even enough to make your plot. Why not put enough data (can you spare 100 observations?) so we can reproduce your plot with the smooth line and at least one of the intersections of the CPU variable and the smooth line? That would make it much easier for someone to help you. – Ben Mar 16 '12 at 04:10
  • I added more data points – george willy Mar 16 '12 at 14:24
  • I really cannot find any answer to this, any ideas? – george willy Mar 16 '12 at 16:55

1 Answers1

7

What you want is still not clear, but I'll take a stab at it.

Let's start by making your dataset reproducible.

app1 <-
structure(list(Date = structure(c(15409, 15408, 15401, 15393, 
15392, 15380, 15376, 15374, 15373, 15371, 15368, 15364, 15360, 
15359, 15358, 15356, 15354, 15353, 15351, 15349, 15347, 15345, 
15344, 15341, 15339, 15333, 15329, 15327, 15324, 15322, 15314, 
15407, 15406, 15405, 15404, 15397, 15394, 15391, 15389, 15388, 
15384, 15379, 15378, 15377, 15375, 15372, 15370, 15369, 15367, 
15366, 15363, 15362, 15357, 15355, 15352, 15348, 15346, 15343, 
15342, 15340, 15336, 15335, 15331, 15330, 15328, 15326, 15323
), class = "Date"), CPU = c(28.7, 94.1, 82.7, 68.5, 67.4, 100, 
100, 89.4974, 100, 100, 57.4693, 100, 98.2085, 99.9987, 99.9698, 
99.9802, 51.5781, 86.5854, 100, 100, 48.3474, 99.9833, 100, 100, 
99.6901, 76.543, 99.9536, 100, 99.9807, 99.9995, 100, 83.2, 67.7, 
70.8, 92.6, 77.3, 74.1, 79.3, 57.8052, 99.9938, 100, 100, 100, 
100, 57.478, 100, 100, 100, 87.604, 100, 100, 100, 100, 99.9477, 
99.9979, 100, 92.6704, 100, 100, 17.501, 100, 100, 99.999, 100, 
99.9865, 8.2211, 100)), .Names = c("Date", "CPU"), row.names = c(NA, 
-67L), class = "data.frame")

Here, the Date column is of class Date; I don't know if that is what you have or not (can't tell from what you posted; that is why a completely reproducible example was requested).

Converting your qplot syntax to ggplot syntax (and adding points so that I can see what is going on easier):

ggplot(app1, aes(x=Date, y=CPU)) +
  geom_point() +
  geom_line() +
  geom_smooth(method="lm")

enter image description here

Your comment

This does not show the dates, it only shows couple of dates. Is it possible to show dates that are of importance like if the date is greater than or smaller than smoothed line?

is confusing. On the x-axis, of course only some dates are shown. You wouldn't want every point labeled. And every point would be on one side or the other of the smoothed line. So I am going to interpret your request as labeling the points on the graph that fall outside the confidence interval drawn on the graph. If this isn't what you meant, then you need to give more detail.

In order to do this, you need to not have ggplot2 do the modeling, but rather do it yourself.

mdl <- lm(CPU~Date, data=app1)
app2 <- cbind(app1, predict(mdl, interval="confidence"))

With this, the original graph can be reproduced.

ggplot(app2, aes(x=Date)) +
  geom_point(aes(y=CPU)) +
  geom_line(aes(y=CPU)) +
  geom_smooth(aes(y=fit, ymin=lwr, ymax=upr), stat="identity")

enter image description here

Now with this separate data set, you can further annotate points as to which ones are extreme and should be labeled.

app2 <- transform(app2,
                  extreme = (CPU < lwr) | (CPU > upr))

ggplot(app2, aes(x=Date)) +
  geom_point(aes(y=CPU)) +
  geom_line(aes(y=CPU)) +
  geom_smooth(aes(y=fit, ymin=lwr, ymax=upr), stat="identity") +
  geom_text(aes(label=as.character(Date), y=CPU), data=app2[app2$extreme,],
            size=3, angle=90)

enter image description here

You can do even more formatting of the text to make it nicer. Here is one example.

app2 <- transform(app2,
                  hadj = ifelse(extreme, ifelse(CPU < lwr, 1.1, -0.1), NA))

ggplot(app2, aes(x=Date)) +
  geom_point(aes(y=CPU)) +
  geom_line(aes(y=CPU)) +
  geom_smooth(aes(y=fit, ymin=lwr, ymax=upr), stat="identity") +
  geom_text(aes(label=format(Date, "%b %d"), y=CPU, hjust=hadj), 
            data=app2[app2$extreme,],
            size=3, angle=90)

enter image description here

EDIT

you can just pull out the dates you want on the axis and pass that to the breaks argument of scale_x_date().

extremedates = app2[app2$extreme,"Date"]

ggplot(app2, aes(x=Date)) +
  geom_point(aes(y=CPU)) +
  geom_line(aes(y=CPU)) +
  geom_smooth(aes(y=fit, ymin=lwr, ymax=upr), stat="identity") +
  scale_x_date(breaks=extremedates) +
  opts(axis.text.x = theme_text(angle=90, size=5))

enter image description here

Brian Diggs
  • 57,757
  • 13
  • 166
  • 188
  • That's a great example and very interesting! How about if we just wanted the x-axis labels to appear only for those data points outside of the CI region? – Ben Mar 17 '12 at 18:46