0

I have a dataframe generated by a function: Each time it's of different number of rows:

structure(list(a = c(1, 2, 3), b = c("er", "gd", "ku"), c = c(43, 
453, 12)), .Names = c("a", "b", "c"), row.names = c(NA, -3L), class = c("tbl_df", 
"tbl", "data.frame"))

structure(list(a = c(1, 2), b = c("er", "gd"), c = c(43, 453)), .Names = c("a", 
"b", "c"), row.names = c(NA, -2L), class = c("tbl_df", "tbl", 
"data.frame"))

I want to be able like in a while loop to control the number of rows to be less then n (n = 4, 100, 4242...) when I bind rows.

Please advise how to do this using functional programming without a while loop? I mean sometimes you will get n = 10 and the df before bind_rows is 7 and after binding the last one it will be 20. It's ok, I want the number of rows to be min_k (k >= n)

Here is my while loop doing this:

b <- list()

total_rows <- 0

while(total_rows < 1000) {

  df <- f_produce_rand_df()

  b[[length(b) + 1]] <- df

  total_rows <- total_rows + nrow(df)

}
SteveS
  • 3,789
  • 5
  • 30
  • 64
  • What exactly is the goal of this? Do you need the length of the resulting data.frame or do you want do combine them conditionally? – hannes101 Jul 12 '18 at 13:05
  • @hannes101 please look on my edited answer. – SteveS Jul 12 '18 at 13:10
  • Still, I can't really see what you are trying to achieve and to be fair I can't really see what this code achieves. – hannes101 Jul 12 '18 at 13:16
  • @hannes101 what exactly you don't understand? I have explained, I am getting tons of dataframes, all of them with same columns. Is it ok until now? Now I want to BIND them by rows, ok? I want to control the bind operation, if the number of rows of the binded dataframe so far is more then specified n, stop binding and return df. Is it clear? – SteveS Jul 12 '18 at 13:19
  • sorry meant to be df. – SteveS Jul 12 '18 at 13:20
  • @AndreElrico I have fixed it please advise. – SteveS Jul 12 '18 at 13:21
  • why do you count the "number of dfs" if you are interessted in the number of rows? – Andre Elrico Jul 12 '18 at 13:21
  • Again, each df has it's own number of rows, it's different. @AndreElrico – SteveS Jul 12 '18 at 13:21
  • I want to apply bind_rows for the df's up until the stop condition - the total number of rows is n or greater but if it's more then n, it should be the last df that contributes. @AndreElrico – SteveS Jul 12 '18 at 13:22
  • 1
    well I can't help you with that. The while loop seems to be an appropriate function for your task. – Andre Elrico Jul 12 '18 at 13:35
  • Understood! Thanks guys! – SteveS Jul 12 '18 at 13:36
  • 1
    Check out `rbindlist` from the `data.table` package or the `bind_rows()` function from the `dplyr` package. They most probably would speed this up. Perhaps with data.table and its speed you perhaps don't need to split it up anyways. – hannes101 Jul 12 '18 at 13:38
  • 1
    In general all looping constructs can be substituted by recursion. I am not entirely sure on the specifics of what you are trying to achieve but it's 100% doable with recursion. – UpsideDownRide Jul 12 '18 at 13:46
  • Additionally to my comment on using `data.table` you can do just that and then afterwards split it for example like in this case https://stackoverflow.com/questions/32125795/split-data-table-into-roughly-equal-parts – hannes101 Jul 13 '18 at 06:11

0 Answers0