0

I've got some data I used to plot a few months ago in ggplot with no problems but now it produces very strange output. The data is structured like below:

factor  time.step   run.number  total.tigers
0.25    0           1           128
0.25    1           1           129
0.25    2           1           134
...         
0.25    0           32          122
0.25    1           32          142
0.25    2           32          153
...         
1       0           32          152
1       1           32          137
1       2           32          158

In other words, I have output total.tigers from models run at different factor levels (i.e., 0.25., 0.5, 0.75, 1). They were run for 240 time.steps and replicated 32 times (i.e., run.number). I'm interesting in plotting total.tigers over time.step, with total.tigers summarized (mean and confidence intervals) over all 32 run.number for each factor level.

My code looks like:

stat_sum_df <- function(fun, geom="crossbar", ...) {
  stat_summary(fun.data=fun, geom=geom, ...)
}

ggplot(data=finaldata.sub, 
       aes(x=time.step, y=total.tigers, color=factor, group=factor)) + 
  stat_sum_df("mean_cl_normal", geom = "smooth", size = 1) + xlab("Month") + ylab("Individuals") +
  scale_x_continuous(breaks = seq(0, 240, by=24)) 

However, instead of getting lines showing changes in total.tigers sequentially over time, I'm getting outputs that zig-zag across time.steps. See picture here. enter image description here

Any suggestions on resolving this?

alistaire
  • 42,459
  • 4
  • 77
  • 117
user2359494
  • 731
  • 5
  • 18
  • If you want lines for each `factor` **and** each `run.number` you'll need both of those in your `group`. Maybe try `group = paste(factor, run.number)`. The lines you're seeing are probably connecting the last point for each run with the first point of the next run. – Gregor Thomas Jan 28 '16 at 00:07
  • @Gregor I think I wanted lines for each factor. Those lines should be the mean of 32 runs. In that case shouldn't the code simply average the total.tigers for each time step? In other words it doesn't even need run.number to be specified? – user2359494 Jan 28 '16 at 01:24
  • 1
    Ah, that makes more sense. In that case, I think the problem is that you're running a summary function that returns 3 values, but the geom you're using only expects one. So the lines connect all the means, then they connect back to the upper confidence intervals and connect all of them, then go back again for the lower confidence intervals. Maybe `geom_ribbon` is more appropriate? I tried it with the `?stat_summary` help and it didn't work out-of-the-box :\ – Gregor Thomas Jan 28 '16 at 02:54
  • @Gregor It definitely looks like it is connecting the mean to the upper confidence interval, etc. I tried using `stat_summary(fun.y="mean", geom = "line", size = 1)` directly in the ggplot command and it displays the mean line correctly. At least I know it isn't something weird with the data. Still need to figure out how to display the confidence intervals. – user2359494 Jan 28 '16 at 15:55
  • If you want more attention to this, I'd strongly suggest making a minimal, reproducible example. Simulate some data with 3 time steps, two factors and 5 run numbers so the problem becomes clearer, reproducible, and more approachable. I don't have time to work on it today, but if you make it *that* easy someone else probably will. – Gregor Thomas Jan 28 '16 at 16:58
  • See also [here](http://stackoverflow.com/q/5963269/903061) for reproducibility tips. – Gregor Thomas Jan 28 '16 at 16:58

0 Answers0