0

I am trying to plot estimated and actual depth values on a river that I worked on. It seems as if the time period when the estimated and actual overlap, the estimated does not output correctly, even though I have estimated values up until September 2012 (the end time on the graph)

library(ggplot2)
library(scales)
LowerHydro<-data.frame(LowerHydrology)
LowerHydro$date <- as.Date(LowerHydro$Date, format = "%m/%d/%y")
LowerHydro<-rename(LowerHydro,c(Clarks.Lower..m.="Depth"))
qplot(main="Lower Clarks Hydrograph",xlab="Date",ylab="Depth(m)",
      date,Depth,data=LowerHydro,group=Group,color=Group,geom="line") + 
   geom_line(lwd=0.70) + 
   scale_x_date(labels=date_format("%b-%y"),
                breaks="60 days",
                limits = as.Date(c("2010-10-01","2012-09-12")),
                expand=c(0.01,0)) + 
   theme_bw()+
   labs(colour="") + 
   scale_y_continuous(expand=c(0.03,0),
                      limits=c(4,20),
                      breaks=seq(4,20,by=2),
                      labels=seq(4,20,by=2)) + 
   theme(axis.title.x=element_text(face='bold',size=16,vjust=-2)) + 
   theme(axis.title.y=element_text(face='bold',size=16,angle = 90,vjust=-0.2,hjust=0.5)) + 
   theme(plot.title=element_text(face='bold',size=25,vjust=2)) + 
   theme(axis.text.x=element_text(size=12)) + 
   theme(axis.text.y=element_text(size=12)) + 
   theme(legend.title=element_text(size=16,hjust=-0.2)) + 
   theme(legend.text=element_text(size=16)) + 
   theme(legend.key.size=unit(c(1.15,1.15),"lines")) + 
   scale_color_manual(values=c("Estimated"="black", "Actual"="blue")) + 
   theme(plot.margin = unit(c(1,-5,2,2),"lines"))

str(LowerHydro)
data.frame':    1053 obs. of  4 variables:
$ Date : Factor w/ 1053 levels "01/01/11","01/01/12",..: 561 563 565 567 569 571 572   574 576 578 ...
$ Depth: num  5.24 5.14 5.42 5.27 5.27 ...
$ Group: Factor w/ 2 levels "Actual","Estimated": 2 2 2 2 2 2 2 2 2 2 ...
$ date : Date, format: "2010-10-01" "2010-10-02" ...

with(LowerHydro, LowerHydro[date %in% seq.Date(as.Date("2012-01-01"),   as.Date("2012-01-10"), by='1 day'),])
     Date Clarks.Lower..m.     Group
457  1/1/2012           11.242 Estimated
458  1/2/2012           11.054 Estimated
459  1/3/2012           11.054 Estimated
460  1/4/2012           10.992 Estimated
461  1/5/2012           10.773 Estimated
462  1/6/2012            9.959 Estimated
463  1/7/2012            8.739 Estimated
464  1/8/2012            7.676 Estimated
465  1/9/2012            7.019 Estimated
466 1/10/2012            6.581 Estimated

Sorry for the tedious code on the qplot...its all aesthetics...but it seems as if its not liking that I have actual and estimated values for the same date range after October 2011. I cannot post an image, but basically I have estimated values for the entire date range, but after they coincide with the actual, the estimate line sort of just flatlines on a slight angle until the end of the time frame.

Here is a link to the graph:

http://s1358.beta.photobucket.com/user/jaredmilitello/media/Rplot01_zps9b29f6d3.png.html

If I edit this code to make the first date in the act 2011-10-07, instead of 2011-07-10 like it was originally I get an error...essentially this code is my dataset without random depths.

> act <- data.frame(date=seq.Date(as.Date('2011-10-07'),
                             as.Date('2012-09-12'),
                             by='1 day'),
              Depth=rnorm(n=431, sd=100),
              Group="Actual")
Error in data.frame(date = seq.Date(as.Date("2011-10-07"), as.Date("2012-09-12"),  : 
arguments imply differing number of rows: 342, 431, 1
> est <- data.frame(date=seq.Date(as.Date('2010-10-01'),
                           as.Date('2012-09-12'),
                           by='1 day'),
             Depth=rnorm(n=713, sd=100),
              Group="Estimate") 
> LowerHydro <- rbind(act, est)
> str(df)
function (x, df1, df2, ncp, log = FALSE)   
> qplot(date, Depth, data=LowerHydro, colour=Group, geom="line")
Jared
  • 85
  • 2
  • 11
  • 3
    Could you provide some example data? Looks like the additional `geom_line(lwd=0.70)` could cause you some trouble. Why are you using it, since you already specified a `geom='line'` in the `qplot(...)` call? You can set `size=0.70` inside `qplot` instead. Try that. – Oscar de León Feb 27 '13 at 22:21
  • 3
    [How to make a reproducible example?](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) – Arun Feb 27 '13 at 22:23
  • @Jared Check out the example bellow. I really doubt that is the problem, but it is possible. Until you provide example data we are stuck with speculation. Also, it seems that the additionsal `deom_line(...)` should not be a problem, but you'll end with each line plotted twice in the same place, one over the other (not even noticeable, I suppose). – Oscar de León Feb 27 '13 at 22:44
  • 3
    (-1), absolutely no regard to make the post a better question. – Arun Feb 27 '13 at 23:03
  • im not exactly sure how to make example data...I'm rather new to R....just using it for some plots and analysis in my thesis...the example data would have to be some where the two groups(estimated and actual) have Depths that overlap each other by date. – Jared Feb 27 '13 at 23:15
  • Holy cow. (1) Please be aware that you can put multiple things in a single `theme()` call. No need for repeating that on multiple lines. (2) Your code will be clearer (and less error prone) if you stop using `qplot`. (3) No one can help you without a reproducible example. – joran Feb 27 '13 at 23:16
  • OK, please try posting the output from `with(LowerHydro, LowerHydro[date %in% seq.Date(as.Date("2012-01-01"), as.Date("2012-01-10"), by='1 day'),])` – alexwhan Feb 27 '13 at 23:19
  • sorry...ill try to make a reproducible example...i just dont have any experience created a fake dataset...the data entered by alexwhan to produce the graph is the exact format my data is in. – Jared Feb 27 '13 at 23:33
  • And can you post the output of the code I asked for (above)? – alexwhan Feb 27 '13 at 23:53
  • isnt it in the post??? I thought I added it – Jared Feb 27 '13 at 23:54
  • OK, there are a couple of weird things about that. (1) there's no `date` variable (despite the fact it's included in the indexing...) and (2) there are no rows where `Group == Actual`, which is strange because that's the group that is being plotted in that date range. If you put the output of `dput(LowerHydro)` somewhere like pastebin I'm happy to have a look at it. – alexwhan Feb 28 '13 at 00:11
  • i mean does it matter that in the text file i imported the rows where group==actual is after all the rows where group==predicted...i mean actual and predicted overlap each other starting on 2011-10-07. – Jared Feb 28 '13 at 00:17
  • http://pastebin.com/R7UYEL3F – Jared Feb 28 '13 at 00:21

2 Answers2

4

Without knowing more about your data, as the comments have already noted, we cannot help you well.

There must be something wrong with your data, since there is no problem plotting two lines with overlapping time periods:

act <- data.frame(date=seq.Date(as.Date('2011-07-10'),
                                as.Date('2012-09-12'),
                                by='1 day'),
                  Depth=rnorm(n=431, sd=100),
                  Group="Actual")
est <- data.frame(date=seq.Date(as.Date('2010-10-01'),
                                as.Date('2012-09-12'),
                                by='1 day'),
                  Depth=rnorm(n=713, sd=100),
                  Group="Estimate")

LowerHydro <- rbind(act, est)
str(df)

qplot(date, Depth, data=LowerHydro, colour=Group, geom="line")

enter image description here

If you want help, make your question reproducible (see the link in comments) and give all the relevant details about your data.

Also, don't bother with all of the adjustments you're making to your plot (be aware they're not aesthetics in the ggplot2 sense) until the basic plot is working. At least don't put all of the irrelevant stuff in your question here.


EDIT

After looking at your actual data, the problem becomes obvious very quickly. If you sort out your plot without worrying about how it looks, then you should avoid running into issues like this in future.

this is what happens when I just run the original qplot:

qplot(date, Depth, data=LowerHydro, group=Group, color=Group, geom="line")

enter image description here

It's clear that the dates are stuffed up for the Estimated group - after the Actual measurements start, the Estimated group jumps about ten years into the future.

Now, as to why that happens, you have to go back to when you converted Date to date. You used format="%m/%d/%Y", which would be great, except that is not consistent. For dates after about 2011-10-04, the format changes from %m/%d/%y to %m/%d/%Y (ie 10/01/11 to 10/01/2011).

To avoid this in future:

  1. Check your data, and see that formats are consistent.
  2. Check your data after you do a conversion like that.
  3. Get your plot sorted before you start worrying about how it looks
  4. Post the most minimal example to stackoverflow, so that everyone isn't looking at the wrong stuff, giving you downvotes, and isn't interested in helping out.
alexwhan
  • 15,636
  • 5
  • 52
  • 66
  • thanks alexwhan...i noticed that as well...i appreciate it...im not experienced with creating a fake dataset...i just import my data in...the syntax is not common language to me where i can spit out a random fake dataset that easily...i understand people's frustration...i wasn't try to make anyone mad intentionally. – Jared Feb 28 '13 at 03:29
  • 1
    I don't think anyone gets mad, it's just frustrating trying to help when things could be explained more clearly. Spend some time looking at highly voted questions to see how best to ask. In this case, 'fake' data wouldn't help, because the problem was in your data, not your plot. Remember to accept questions that answer your problem, and upvote anything that is helpful, SO is a great place to learn – alexwhan Feb 28 '13 at 04:07
1

Here is what I tried.

I generated some sample data to try your plot:

library(package=ggplot2)
library(package=scales)

LowerHydro <- data.frame(date=seq.Date(as.Date('2010-10-01'),
                                       as.Date('2012-09-12'),
                                       by='1 day'),
                         Depth=rnorm(n=713, sd=100),
                         Group=c(rep('Estimated', 363),
                                 rep('Actual', 350)))

And plotted it (a simplified plot, mind you)

qplot(date, Depth, data=LowerHydro, group=Group, color=Group, geom="line")+
    scale_x_date(labels=date_format("%b-%y"),breaks="60 days",
                 limits = as.Date(c("2010-10-01","2012-09-12")),
                 expand=c(0.01,0))+theme_bw()

Everything seems as expected.

Now, I added a mislabeled date at the end (the last date has data both for Actual and Estimated data)

LowerHydro <- rbind(LowerHydro, data.frame(date=as.Date('2012-09-12'),
                             Depth=rnorm(n=1, sd=100),
                             Group='Estimated'))

And then the plot breaks

qplot(date, Depth, data=LowerHydro, group=Group, color=Group, geom="line")+
    scale_x_date(labels=date_format("%b-%y"),breaks="60 days",
                 limits = as.Date(c("2010-10-01","2012-09-12")),
                 expand=c(0.01,0))+theme_bw()

Have you checked the date range in each of Estimated and Actual data?

Community
  • 1
  • 1
Oscar de León
  • 2,331
  • 16
  • 18
  • what do you mean by checking the date range of the estimated and actual???? I have estimated depths for entire time frame (10-01-2010 until 9-12-2012) and I have actual data from a HOBO water logger that at the bottom of the river from 10-07-2011 until 9-12-2012. – Jared Feb 27 '13 at 22:52
  • You may be asking why I have estimated for the entire time frame and it is because I set the water logger out on 10-07-2011, but I was doing a fish telemetry study and the tracking of those fish occurred from roughly 10-01-2010 until September of 2011. I wanted to back calculate the depths in the river for the timeframe the water logger was not out, so I fit a model with data from a nearby river and the relationship was almost perfect. I used that equation to come up with estimated values for the river I was working on. – Jared Feb 27 '13 at 22:53
  • I want the estimated values to show up for the entire time frame to show how good my model was at predicting water levels in this river, even though I have actual data from 10-07-2011 onwards. – Jared Feb 27 '13 at 22:54
  • my plot will work if the estimated and actual data are during two separate time frames...its where the estimated and actual overlap where the problem occurs..the plot will not plot the depth values in the estimated group from 10-07-2011 onwards until 9-12-2012....my plot does the same thing as your plot does in the example data you created...once the estimated data reaches the timeframe where the actual data starts, a straight line is drawn to the end of the timeframe for the estimated line – Jared Feb 27 '13 at 23:04
  • Yes, it was only a hint to what @alexwhan described in his answer. Sorry I didn't understand your description of the data (the overlapping). – Oscar de León Feb 28 '13 at 13:04