2

I have tried finding answers based on similar questions

Being absolutely new to tidyverse, I have the following question: how can I estimate a median per ntile() using dplyr

# Data    
library(survival)
data(lung)

First

p <- lung %>% mutate(test=ntile(inst,3))

So now that

table(p$test)

 1  2  3 
76 76 75 

I would like to estimate the median time, ie p$time, per p$test

Something like

p %>% mutate(test=ntile(inst,3), test.time=median(time[test %in% 1:3]))

Which did not provide what I sought.

cmirian
  • 2,572
  • 3
  • 19
  • 59
  • You already have the groups now you just need to calculate [median per group](https://stackoverflow.com/questions/25198442/how-to-calculate-mean-median-per-group-in-a-dataframe-in-r). – Ronak Shah Feb 02 '20 at 23:55

1 Answers1

2

We can use the 'test' as a grouping variable to calculate the median of 'time'

library(dplyr)
lung %>% 
  group_by(test = ntile(inst, 3)) %>%
  mutate(test.time=median(time))

If a summarised output is needed, then replace mutate with summarise

akrun
  • 874,273
  • 37
  • 540
  • 662
  • Thank you @akrun, that did it. Can you briefly explain how I know that `median(time)` written in that last line refers to "test", which was created in second line? I mean why did it not estimate the median(time) based on any other variabel? – cmirian Feb 03 '20 at 06:18
  • 1
    @cmirian It is only grouping by the `test` variable created in `group_by` and the 'time' here is using the only the rows of time for each 'test' group – akrun Feb 03 '20 at 16:21