0

I have a dataset with specified number of records per person:

set.seed(99)
# Create values from a Poisson distribution
freqs <- rpois(100, 3) 
# Add an ID to each row
freqs <- as.data.frame(freqs)
freqs$id <- seq_len(nrow(freqs))

I now want the value in the freqs$freqs to be the number of observations per each ID. The transformation would look like:

ID    freqs
1      3
2      1
...    ...
3      2

Ending up with:

ID    freqs
1      3
1      3
1      3
2      1
...    ....
3      2
3      2
Yolo_chicken
  • 1,221
  • 2
  • 12
  • 24

3 Answers3

2

An option is uncount from tidyr

library(tidyr)
library(dplyr)
uncount(freqs, freqs, .remove = FALSE)  %>% 
        as_tibble %>% 
        select(id, freqs)
akrun
  • 874,273
  • 37
  • 540
  • 662
  • Worked like a charm! Out of curiosity, my row index values show as decimals now. For example ID 1 row index are 1.0, 1.1, ..., 1.9; why is that? I can't find any docs on it, but maybe it's an R Studio thing? – Yolo_chicken Jun 10 '19 at 16:15
  • 2
    @Yolo_chicken The reason is that duplicate row names are not allowed in data.frame. So, it use `make.unique` function to convert those to a unique row name – akrun Jun 10 '19 at 16:16
2

Another tidyverse option to get the ids:

plyr::ldply(purrr::map2(freqs$id,freqs$freqs,function(x,y) rep(x,y)),
            data.frame) 
NelsonGon
  • 13,015
  • 7
  • 27
  • 57
2
as.data.frame(lapply(freqs, rep, freqs$freqs))

#     freqs id
# 1       3  1
# 2       3  1
# 3       3  1
# 4       1  2
# 5       4  3
# 6       4  3
# 7       4  3
# 8       4  3
# 9       8  4
# 10      8  4
# 11      8  4
# 12      8  4
# 13      8  4
# 14      8  4
# 15      8  4
# 16      8  4  
# ...

or

purrr::map_dfr(freqs, rep, freqs$freqs)

# # A tibble: 293 x 2
#    freqs    id
#    <int> <int>
#  1     3     1
#  2     3     1
#  3     3     1
#  4     1     2
#  5     4     3
#  6     4     3
#  7     4     3
#  8     4     3
#  9     8     4
# 10     8     4
# # ... with 283 more rows
IceCreamToucan
  • 28,083
  • 2
  • 22
  • 38