Repeat each row of data.frame the number of times specified in a column

Question

df <- data.frame(var1 = c('a', 'b', 'c'), var2 = c('d', 'e', 'f'),
                 freq = 1:3)

What is the simplest way to expand each row the first two columns of the data.frame above, so that each row is repeated the number of times specified in the column 'freq'?

In other words, go from this:

df
  var1 var2 freq
1    a    d    1
2    b    e    2
3    c    f    3

To this:

df.expanded
  var1 var2
1    a    d
2    b    e
3    b    e
4    c    f
5    c    f
6    c    f

score 201 · Accepted Answer · answered May 24 '10 at 05:01

201

Here's one solution:

df.expanded <- df[rep(row.names(df), df$freq), 1:2]

Result:

    var1 var2
1      a    d
2      b    e
2.1    b    e
3      c    f
3.1    c    f
3.2    c    f

answered May 24 '10 at 05:01

neilfws

32,751
5
50
63

2

Great! I always forget you can use square brackets that way. I keep thinking of indexing just for subsetting or reordering. I had another solution that is far less elegant and no doubt less efficient. I might post anyway so that others can compare. – wkmor1 May 24 '10 at 10:30
27

For large `data.frame` more efficient is to replace `row.names(df)` with `seq.int(1,nrow(df))` or `seq_len(nrow(df))`. – Marek May 25 '10 at 11:54
This worked fantastically for a big data frame -- 1.5million rows, 5 cols, went very quick. Thanks! – gabe Nov 21 '12 at 06:16
6

1:2 hard codes the solution to this example, 1:ncol(df) will work for an arbitrary dataframe. – vladiim Aug 30 '18 at 06:28

score 113 · Answer 2 · edited May 18 '18 at 03:36

113

old question, new verb in tidyverse:

library(tidyr) # version >= 0.8.0
df <- data.frame(var1=c('a', 'b', 'c'), var2=c('d', 'e', 'f'), freq=1:3)
df %>% 
  uncount(freq)

    var1 var2
1      a    d
2      b    e
2.1    b    e
3      c    f
3.1    c    f
3.2    c    f

edited May 18 '18 at 03:36

Sam Firke

21,571
9
87
105

answered Feb 01 '18 at 21:12

einar

2,575
1
16
14

grate function! a lifesaver thanks!! – avallecam Jun 23 '23 at 20:45

score 48 · Answer 3 · edited Feb 26 '17 at 17:11

48

Use expandRows() from the splitstackshape package:

library(splitstackshape)
expandRows(df, "freq")

Simple syntax, very fast, works on data.frame or data.table.

Result:

    var1 var2
1      a    d
2      b    e
2.1    b    e
3      c    f
3.1    c    f
3.2    c    f

edited Feb 26 '17 at 17:11

Sotos

51,121
6
32
66

answered May 11 '15 at 16:18

Sam Firke

21,571
9
87
105

score 27 · Answer 4 · edited Jun 05 '18 at 10:46

27

@neilfws's solution works great for data.frames, but not for data.tables since they lack the row.names property. This approach works for both:

df.expanded <- df[rep(seq(nrow(df)), df$freq), 1:2]

The code for data.table is a tad cleaner:

# convert to data.table by reference
setDT(df)
df.expanded <- df[rep(seq(.N), freq), !"freq"]

edited Jun 05 '18 at 10:46

MichaelChirico

33,841
14
113
198

answered Dec 11 '14 at 23:01

Max Ghenis

14,783
16
84
132

6

another alternative: `df[rep(seq(.N), freq)][, freq := NULL]` – Jaap Apr 10 '17 at 06:04
1

another alternative `df[rep(1:.N, freq)][, freq:=NULL]` – Dale Kube Aug 11 '19 at 12:46

score 8 · Answer 5 · answered Jul 08 '19 at 06:20

Another dplyr alternative with slice where we repeat each row number freq times

library(dplyr)

df %>%  
  slice(rep(seq_len(n()), freq)) %>% 
  select(-freq)

#  var1 var2
#1    a    d
#2    b    e
#3    b    e
#4    c    f
#5    c    f
#6    c    f

seq_len(n()) part can be replaced with any of the following.

df %>% slice(rep(1:nrow(df), freq)) %>% select(-freq)
#Or
df %>% slice(rep(row_number(), freq)) %>% select(-freq)
#Or
df %>% slice(rep(seq_len(nrow(.)), freq)) %>% select(-freq)

rdornas · Answer 6 · 2020-05-08T20:46:29.010

7

I know this is not the case but if you need to keep the original freq column, you can use another tidyverse approach together with rep:

library(purrr)

df <- data.frame(var1 = c('a', 'b', 'c'), var2 = c('d', 'e', 'f'), freq = 1:3)

df %>% 
  map_df(., rep, .$freq)
#> # A tibble: 6 x 3
#>   var1  var2   freq
#>   <fct> <fct> <int>
#> 1 a     d         1
#> 2 b     e         2
#> 3 b     e         2
#> 4 c     f         3
#> 5 c     f         3
#> 6 c     f         3

^{Created on 2019-12-21 by the reprex package (v0.3.0)}

edited May 08 '20 at 20:46

answered Dec 21 '19 at 03:54

rdornas

630
7
15

5

Or just use `.remove = FALSE` in `uncount()` – Jun 30 '20 at 15:35

vonjd · Answer 7 · 2015-07-07T16:51:04.473

6

In case you have to do this operation on very large data.frames I would recommend converting it into a data.table and use the following, which should run much faster:

library(data.table)
dt <- data.table(df)
dt.expanded <- dt[ ,list(freq=rep(1,freq)),by=c("var1","var2")]
dt.expanded[ ,freq := NULL]
dt.expanded

See how much faster this solution is:

df <- data.frame(var1=1:2e3, var2=1:2e3, freq=1:2e3)
system.time(df.exp <- df[rep(row.names(df), df$freq), 1:2])
##    user  system elapsed 
##    4.57    0.00    4.56
dt <- data.table(df)
system.time(dt.expanded <- dt[ ,list(freq=rep(1,freq)),by=c("var1","var2")])
##    user  system elapsed 
##    0.05    0.01    0.06

edited Jul 07 '15 at 16:51

answered Jul 06 '15 at 10:18

vonjd

4,202
3
44
68

1

I get an error: `Error in rep(1, freq) : invalid 'times' argument`. And given that there is already a data.table answer to this question, you may want to describe how your approach is different or when it is better than the current data.table answer. Or if there's not a major difference, you could add it as a comment to the existing answer instead. – Sam Firke Jul 07 '15 at 16:20
@SamFirke: Thank you for your comment. Strange, I just tried it again and I get no such error. Do you use the original `df`from the OP's question? My answer is better because the other answer is kind of misusing the `data.table` package by using `data.frame` syntax, see the FAQ of `data.table`: " It is generally bad practice to refer to columns by number rather than name." – vonjd Jul 07 '15 at 16:37
1

Thanks for the explanation. Your code works for me on the sample `df` posted by the OP, but when I tried to benchmark this on a larger data.frame I got that error. The data.frame I used was: `set.seed(1) dfbig <- data.frame(var1=sample(letters, 1000, replace = TRUE), var2=sample(LETTERS, 1000, replace = TRUE), freq=sample(1:10, 1000, replace = TRUE))` On the tiny data.frame, the base answer does well in my benchmarking, it just doesn't scale well to bigger data.frames. The other three answers ran successfully with this larger data.frame. – Sam Firke Jul 07 '15 at 17:06
@SamFirke: This is indeed strange, it should work there too and I don't know why it doesn't. Do you want to create a question out of it or shall I? – vonjd Jul 07 '15 at 17:18
Good idea. Can you? I don't know `data.table` syntax so I shouldn't be the one judging the answers. – Sam Firke Jul 07 '15 at 17:22
@SamFirke: Did it, so let's see what comes in... http://stackoverflow.com/questions/31276157/strange-error-when-expanding-data-table – vonjd Jul 07 '15 at 18:06
@SamFirke: We found the problem, it was quite a ride: http://stackoverflow.com/questions/31276157/strange-error-when-expanding-data-table – vonjd Jul 07 '15 at 19:17
Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/82644/discussion-between-sam-firke-and-vonjd). – Sam Firke Jul 07 '15 at 19:42

score 4 · Answer 8 · answered May 21 '19 at 20:14

Another possibility is using tidyr::expand:

library(dplyr)
library(tidyr)

df %>% group_by_at(vars(-freq)) %>% expand(temp = 1:freq) %>% select(-temp)

#> # A tibble: 6 x 2
#> # Groups:   var1, var2 [3]
#>   var1  var2 
#>   <fct> <fct>
#> 1 a     d    
#> 2 b     e    
#> 3 b     e    
#> 4 c     f    
#> 5 c     f    
#> 6 c     f

One-liner version of vonjd's answer:

library(data.table)

setDT(df)[ ,list(freq=rep(1,freq)),by=c("var1","var2")][ ,freq := NULL][]

#>    var1 var2
#> 1:    a    d
#> 2:    b    e
#> 3:    b    e
#> 4:    c    f
#> 5:    c    f
#> 6:    c    f

^{Created on 2019-05-21 by the reprex package (v0.2.1)}

score 2 · Answer 9 · answered Oct 12 '22 at 05:19

2

I am providing one more addition to this wonderful thread of nice answers! Use the tidyr package (included in tidyverse) for a one-liner solution:

df %>% tidyr::uncount(weights = freq)

answered Oct 12 '22 at 05:19

a1a5a6

41
4

score 1 · Answer 10 · answered Dec 02 '20 at 05:54

in fact. use the methods of vector and index. we can also achieve the same result, and more easier to understand:

rawdata <- data.frame('time' = 1:3, 
           'x1' = 4:6,
           'x2' = 7:9,
           'x3' = 10:12)

rawdata[rep(1, time=2), ] %>% remove_rownames()
#  time x1 x2 x3
# 1    1  4  7 10
# 2    1  4  7 10

Repeat each row of data.frame the number of times specified in a column

10 Answers10

Linked

Related