2

I'm not exactly sure how to ask the question since english isn't my first language. What I want is duplicate each unique id rows 13 times and create a new column which contains rows with value ranging from -8 to 4 to fill those 13 previously duplicated rows. I think my sample data and expected data will provide a better explanation.

sample data:

data <- data.frame(id = seq(1,100,1),
                   letters = sample(c("A", "B", "C", "D"), replace = TRUE))

> head(data)
    id letters
1    1    A
2    2    B
3    3    B
4    4    C
5    5    A
6    6    B

the expected data:

   newcol id letters
1      -8  1       A
2      -7  1       A
3      -6  1       A
4      -5  1       A
5      -4  1       A
6      -3  1       A
7      -2  1       A
8      -1  1       A
9       0  1       A
10      1  1       A
11      2  1       A
12      3  1       A
13      4  1       A
14     -8  2       B
15     -7  2       B
16     -6  2       B
17     -5  2       B

So I guess I could say that I want to create a new column wit values ranging from -8 to 4 (so 13 different values) for each unique rows in the id column. Also if possible I would like to know how to do it in base R in with the data.table package.

Thank you and sorry for my poor grammar.

Tedel
  • 131
  • 9

1 Answers1

3

We can use uncount

library(tidyr)
library(dplyr)
data %>%
  uncount(13) %>%
  group_by(id) %>%
  mutate(newcol = -8:4) %>%
  ungroup

Or in base R

data1 <- data[rep(seq_len(nrow(data)), each = 13),]
data1$newcol <- -8:4

Or using data.table

library(data.table)
setDT(data)[rep(seq_len(.N), each = 13)][, newcol := rep(-8:4, length.out = .N)][]
Gainz
  • 1,721
  • 9
  • 24
akrun
  • 874,273
  • 37
  • 540
  • 662
  • 1
    I am gonna accept your answer in 6 minutes when stack will allow me. – Tedel Nov 05 '19 at 17:47
  • One more thing, on one of my data I get the following error : ``Error in `[<-.data.table`(x, j = name, value = value) : Supplied 13 items to be assigned to 544232 items of column 'newcol'. The RHS length must either be 1 (single values are ok) or match the LHS length exactly. If you wish to 'recycle' the RHS please use rep() explicitly to make this intent clear to readers of your code.`` Is it because this data is a data table instead of a data frame? Thank you. – Tedel Nov 05 '19 at 17:52
  • 2
    @Tedel The error shows that your dataset is `data.table` instead of `data.frame` that you showed. If it is a `data.table` `dt[rep(seq_len(.N), each = 13)][, newcol := -8:4][]` – akrun Nov 05 '19 at 17:53
  • 1
    Wonderful, I suggest you add this inside your answer. Anyway, thanks a lot. – Tedel Nov 05 '19 at 17:54