1

I have a data.frame as below;

df <- data.frame(ID = c(2,3,5,8,9,10,12,13,14,15,16),
             value = c(1,2,3,4,5,6,7,8,9,10,11))
> df
   ID value
1   2     1
2   3     2
3   5     3
4   8     4
5   9     5
6  10     6
7  12     7
8  14     8
9  15     9
10 16    10
11 17    11

Here, I would like to obtain the list of medians when ID is consecutive. For example, ID in the first two row shows 2,3, which is consecutive. In this case, I would like to obtain the median of value in the first two rows, which should be

> median(c(1,2))
[1] 1.5

Then, next consecutive ID are 8,9,10, 14,15,16,17. The corresponding medians should be

> median(c(4,5,6))
[1] 5
> median(c(8,9,10,11))
[1] 9.5

Then, what I finally want is the data.frame like below

   ID   median
1   2    1.5
2   8    5
3  14    9.5

I wonder rle might be useful, but I am not sure how I implement this. Do you have any suggestion to implement this? I would be grateful for any suggestion.

Henrik
  • 65,555
  • 14
  • 143
  • 159
imtaiky
  • 191
  • 1
  • 12
  • You may create a grouping variable as described here: [Create grouping variable for consecutive sequences and split vector](https://stackoverflow.com/questions/5222061/create-grouping-variable-for-consecutive-sequences-and-split-vector). (the split is not needed). Then run your favorite 'by-group' function. – Henrik Apr 08 '21 at 20:20
  • Thank you very much for your comment. Although I used the method from another answer, but the url you provided is very helpful! – imtaiky Apr 09 '21 at 12:01
  • You are welcome! As you see, the same idiom is used in the answer below: `cumsum(...diff(`. Cheers. – Henrik Apr 09 '21 at 12:33
  • Yes, I had found that point! Thank you very much. Sincerely. – imtaiky Apr 09 '21 at 15:44

1 Answers1

0

Here is a data.table option

setDT(df)[
  ,
  if (.N > 1) data.table(ID = min(ID), value = median(value)),
  .(grp = cumsum(c(TRUE, diff(ID) != 1)))
][
  ,
  grp := NULL
][]

which gives

   ID value
1:  2   1.5
2:  8   5.0
3: 12   9.0
ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81
  • This works! I have not used data.table before, and I will learn it more because it looks very useful. – imtaiky Apr 09 '21 at 12:00