0

I have a df with groups in different trials, and I want to make a bar graph of just deltas between trials in ggplot. Having a hard time getting ggplot to understand I want the differences in one df. Also, some of the treatments aren't represented in the second trial, so I want to just count that as 0 (i.e. delta would be = trial 1 - 0).

 set.seed(1)

 df <- data.frame((matrix(nrow=175,ncol=4)))
 colnames(df) <- c("group","trial","count","hour")
 df$group <- rep(c("A","B","C","D","A","B","D"),each=25)
 df$trial <- rep(c(rep(1,times=100),rep(2,times=75)))
 df$count <- runif(175,0,50)
 df$hour <- rep(1:25,times=7)


 df2 <- aggregate(df[,3:4],list(df$group,df$trial),mean)
 colnames(df2)[1:2] <- c("group","trial") 

That's where I've gotten to. I have plotted with individual bars for (group*trial), but I can't figure out how to subtract them. I want a plot of x=group and y= delta(trial).

I tried this:

 ggplot(df2 %>% group_by(group) %>% delta=diff(count),
   aes(x=group,y=delta)) + geom_bar()

from a similar posting I came across, but no luck.

Jake L
  • 987
  • 9
  • 21
  • You have to pipe into functions. `df2 %>% group_by(group) %>% delta=diff(count)` `group_by` is a function, but `delta = ` is not a pipeable function. Presumably you want `df2 %>% group_by(group) %>% summarize(delta=diff(count))`. Or you could use `aggregate` as you did above, instead of the `dplyr` piping. – Gregor Thomas Jul 15 '19 at 15:07
  • You also have an issue that `diff()` of a single number returns nothing, not `NA`, so changing to `summarize(delta = if(n() != 2) diff(count) else NA)` will make sure you get a length-1 result for each group. – Gregor Thomas Jul 15 '19 at 15:13
  • @Gregor Thanks for clarifying the piping. So, when I use `group_by(group) %>% summarize(delta=diff(count))`, I get return of one vector, which seems like df2 just took the difference between every line and the next, instead of by group.. Then when I added your second edit, I'm getting an error involving n(). I understand aggregate[] a little better, but when I use `aggregate(df2[,3:4],list(df$group),diff())` , I get an error due to the missing C2 group. Any ideas how I can add an if to aggregate[] to negate this? – Jake L Jul 15 '19 at 15:27
  • Sounds like you loaded `plyr` after `dplyr`, ignoring the warning that printed, [as in this R-FAQ](https://stackoverflow.com/q/26106146/903061). Suggested fixes are there. – Gregor Thomas Jul 15 '19 at 15:33
  • Just that I get it: you want to plot the delta of the group means per trial? – TobiO Jul 15 '19 at 16:16
  • @TobiO yes, a bar plot centered around 0. But I actually finally just figured it out from @Gregor 's answer. There was just a small typo - should have been `if n() !=1`. Though if you have another way, feel free to share. I love learning different ways of doing the same things in R. – Jake L Jul 15 '19 at 16:29
  • @JakeL just added. The method of using delta is versatile but error-prone. If you would order your df by something else before, you might get different results. – TobiO Jul 15 '19 at 16:37

1 Answers1

1

this should do the trick:

ggplot(df2 %>% group_by(group) %>% summarise(delta=ifelse(nrow(.)>1,diff(count),0)),
       aes(x=group,y=delta)) + geom_col()#geom_bar(stat="identity")

The problems are, that "diff" returns not the value 0 but a vector of length 0, when there is only one input value. Also instead of using geom_bar, I recommend geom_col. Another thing, you should think about, is that the diff result is depending on the order of your data frame. As such I would recommend to use

ggplot(df2 %>% group_by(group) %>% summarise(delta_trial_1_trial_2=
                                           ifelse(length(trial)>1,
                                                  count[trial==2]-count[trial==1],0)),
   aes(x=group,y=delta_trial_1_trial_2)) + geom_col()
TobiO
  • 1,335
  • 1
  • 9
  • 24