0

I am trying to create a “day_cent” variable, where the day is centred at the maximum tn for each ID. I have written the following code, but “day_centtn” has a large number of seemingly random NAs, and I don’t understand how to fill in the gaps.

df <- df %>% group_by(id) %>%
mutate(day_centtn = day - day[tn == max])

My aim would then be to plot tn vs day_centtn, but at the moment this brings up a blank grid

p <- ggplot(df, aes(x=day_centtn, y=tn_frac, group=id))
p +  geom_line(aes(colour=id))  + geom_point() + 
xlim(-5,5) + geom_vline(xintercept = 0) + ylim(0,100)) 

id  day tn  max day_centtn tn_frac
1   0   NA  32  NA  NA
1   1   32  32  0   100
1   2   27  32  NA  84.375
1   3   13  32  NA  40.625
1   4   NA  32  NA  NA
1   5   9   32  NA  28.125
1   6   NA  32  NA  NA
1   7   9   32  NA  28.125
1   8   NA  32  NA  NA
1   9   NA  32  8   NA
1   10  NA  32  NA  NA
1   180 NA  32  NA  NA
2   0   NA  9   NA  NA
2   1   NA  9   NA  NA
2   2   NA  9   NA  NA
2   3   8   9   NA  88.8888889
2   4   6   9   -5  66.6666667
2   5   7   9   NA  77.7777778
2   6   7   9   NA  77.7777778
2   7   7   9   NA  77.7777778
2   8   NA  9   NA  NA
2   9   9   9   NA  100
2   10  7   9   1   77.7777778
3   0   14  1935-2  0.7235142
3   1   11671935NA  60.3100775
3   2   193519350   100
3   3   10391935NA  53.6950904
3   4   308 19352   15.9173127
3   5   112 1935NA  5.7881137
3   6   103 19354   5.3229974
3   7   76  1935NA  3.9276486
3   8   65  19356   3.3591731
3   9   48  1935NA  2.4806202
3   10  27  19358   1.3953488

Many thanks, Annemarie

Annemarie
  • 123
  • 1
  • 8
  • Please read the info about how to give a [reproducible example](http://stackoverflow.com/questions/5963269). This will make it much easier for others to help you. – Axeman Jan 20 '17 at 15:33
  • @Axeman I have hopefully improved this, thanks for the link. – Annemarie Jan 20 '17 at 15:52

1 Answers1

1

I would first add a column where you evaluate if tn equals max. In this evaluation function you could take care of the NA values in tn. This is important because comparing a value to NA returns NA (NA == 32 returns NA).

Something like:

df %>% 
mutate(is_max = ifelse(! is.na(tn), tn == max, FALSE) %>%
group_by(id) %>%
mutate(day_centtn = day - day[is_max])

However, there will be a bug in this code if there is more than one tn value equal to the max in the id group. In case that happens this might work (although I have not tested it).

mutate(day_centtn = day - day[is_max][1])
Jeroen Boeye
  • 580
  • 4
  • 18
  • Thank you. I think it's the missing values in troponin that's the problem isn't it (it ran fine when I just used record 3). I can't get your code to work - it's complaining that Error in eval(substitute(expr), envir, enclos) : no applicable method for 'group_by_' applied to an object of class "logical" – Annemarie Jan 20 '17 at 16:07
  • I've done it! I've used day=max day, rather than tn=max tn, and it's worked. Thank you – Annemarie Jan 20 '17 at 16:10