Creating columns for each aggregation bucket with group_by()and dplyr

Question

I have a dataset where each row represents a citation to an article, along with the difference between the publication date of the article and each reference, like so:

    EID     ref    delta
1   2-s2    r1     0
2   2-s2    r2     3
3   2-s2    r3     22
4   2-s2    r4     100
5   2-s2    r5     7
6   3-s2    r6     1
7   3-s2    r7     0
8   3-s2    r8     1

I want to determine for each distinct EID, how many references fall in different ranges of year deltas (I.e. for a given article, how many references are 1 year old, 2 years old, 4 years old, etc?). I attempted to create buckets for each:

buckets=c(0,1,2,4,8,16,32,64,9999)
bt=bt %>%
  mutate(delta = as.numeric(delta)) %>%
  mutate(bucket=cut(delta, breaks = buckets))

group = bt %>%
  group_by(EID, bucket) %>%
  summarise(count=n())

The resulting grouped data is:

    EID     bucket  count
1   2-s2    (1,2]   6
2   2-s2    (2,4]   8
3   2-s2    (4,8]   16
4   2-s2    (8,16]  18
5   2-s2    (16,32] 10
6   3-s2    (1,2]   1
7   3-s2    (2,4]   13
8   3-s2    (4,8]   1
9   4-s1    (4,8]   3

I would like to create a column for each bucket I have, and then group by EID, placing the appropriate count in the appropriate bucket for each EID, where the result looks something like this:

    EID     (1,2]  (2,4]  (4,8]  (8,16] (16,32]
1   2-s2    6      8      16     18     10
2   3-s2    1      13     1      0      0
2   4-s1    0      0      3      0      0

Looking at the code I used to generate the first table, it seems like I should be able to use unstack(group, bucket~count) somehow, or just directly automate the creation of these bucket columns using summarise() but I'm not clear on exactly how to do so. Ideally, I would not have to hard-code in each column; I would like to be able to reference the bucketing list, so if I decide to change the bucketing scheme, it will update accordingly. Thank you!

score 0 · Accepted Answer · answered Feb 27 '20 at 19:57

0

We can use pivot_wider to reshape to 'wide' format

library(dplyr)
library(tidyr)
group %>%
   pivot_wider(names_from =bucket, values_from = count)

answered Feb 27 '20 at 19:57

akrun

874,273
37
540
662

Creating columns for each aggregation bucket with group_by()and dplyr

1 Answers1