Making rbind loop faster

Question

I have a two column dataframe that has the value in the left column and frequency of that value in the right column. I want to reflect this data in a new dataframe that is just one column.

I have got it working with the 2 for loops below, but with my data (100k+ rows and many dataframes) its very slow. I've tried using the apply functions but cant work it out.

library(tidyverse)

twocol <- tribble(
  ~value, ~count,
  0.23076923, 5,
  0.69076923, 3,
  1.15230769, 4,
  1.61384615, 4,
  2.15230769, 3
) %>% as.data.frame()

make_onecol <- function(df) {
  dfnew <- data.frame(value=NA)
  df %>% filter(count!=0) -> df
  for (i in 1:nrow(df)) {
    n <- df[i, 2]
    for (j in 1:n) {
      dfnew <- rbind(dfnew, df[i, 1])
    }
  }
  return(dfnew)
}

onecol <- make_onecol(twocol)

I don't speak tidyverse but is your goal to repeat each `value` `count` times? Then you can simply do `rep(twocol$value, twocol$count)` — Roland, Dec 15 '17 at 12:32

Jaap · Answer 1 · 2017-12-15T12:38:34.530

3

You can just use the rep-function for that. Using:

onecol <- data.frame(value = c(NA, rep(twocol$value, twocol$count)))

gives:

> onecol
       value
1         NA
2  0.2307692
3  0.2307692
4  0.2307692
5  0.2307692
6  0.2307692
7  0.6907692
8  0.6907692
9  0.6907692
10 1.1523077
11 1.1523077
12 1.1523077
13 1.1523077
14 1.6138462
15 1.6138462
16 1.6138462
17 1.6138462
18 2.1523077
19 2.1523077
20 2.1523077

edited Dec 15 '17 at 12:38

answered Dec 15 '17 at 12:34

Jaap

81,064
34
182
193

Perfect thank you! Man I went to a lot of trouble for nothing! :) – jimbo Dec 15 '17 at 12:36

pogibas · Answer 2 · 2017-12-15T12:42:44.050

1

rep wrapper with data.table:

library(data.table)
setDT(twocol)[, .(value = rep(value, count))]
#     value
# 0.2307692
# 0.2307692
# 0.2307692
# 0.2307692
# 0.2307692
# 0.6907692
# 0.6907692
# 0.6907692
# 1.1523077
# 1.1523077
# 1.1523077
# 1.1523077
# ...

edited Dec 15 '17 at 12:42

answered Dec 15 '17 at 12:33

pogibas

27,303
19
84
117

your earlier solution `data.frame(value = with(two_col, rep(value, count)))` is performing better than `data.table` when generating 150000 rows. I haven't checked with more number of rows. – Kushdesh Dec 15 '17 at 12:59

Making rbind loop faster

2 Answers2