0

If I had a function like this:

foo <- function(var) {
  if(length(var) > 5) stop("can't be greater than 5")

  data.frame(var = var)
}

Where this worked:

df <- 1:20

foo(var = df[1:5])

But this didn't:

foo(var = df)

The desired output is:

   var
1    1
2    2
3    3
4    4
5    5
6    6
7    7
8    8
9    9
10  10
11  11
12  12
13  13
14  14
15  15
16  16
17  17
18  18
19  19
20  20

If I know that I can only run this function in chunk of 5 rows, what would be the best approach if I wanted to evaluate all 20 rows? Can I use purrr::map() for this? Assume that the 5 row constraint is rigid.

Thanks in advance.

boshek
  • 4,100
  • 1
  • 31
  • 55
  • 1
    Do you really want to iterate over the `Species` column? Thats a vector of 150 elements, setosa, setosa, setosa..... – Jack Brookes Mar 22 '18 at 17:56
  • I guess you want some version of `split`. The tidyverse doesn't have it's own version of split, even the example usage uses base `split()`: https://github.com/tidyverse/purrr#usage Seems like it would be way more intuitive to use `dplyr` here. – MrFlick Mar 22 '18 at 18:36
  • @MrFlick I agree completely. My toy example suffer a little. In my actual example I want to take the first 20 rows, apply those to a function, then take the next 20 rows etc etc. This is the best way I could create a reprex for it. – boshek Mar 22 '18 at 18:43
  • Then just add a column to your data.frame that changes every 20 rows with `mutate()` and use `dplyr`. – MrFlick Mar 22 '18 at 18:47
  • @MrFlick I don't think I'v explained myself very well. I'll try to revise the question. If I can't I'll just go ahead and delete it. – boshek Mar 22 '18 at 18:52
  • @MrFlick Revised to hopefully be clearer. – boshek Mar 22 '18 at 19:15
  • What's the desired output? – MrFlick Mar 22 '18 at 19:20
  • @MrFlick - added to the question – boshek Mar 22 '18 at 20:41
  • purrr doesn't have a splitting feature. Use one like this: https://stackoverflow.com/questions/3318333/split-a-vector-into-chunks-in-r then use `map_dfr` over those chunks. – MrFlick Mar 22 '18 at 20:50
  • @MrFlick - did you want to write this up as an asnwer? The answer below was generated what I did but I used your input. I'll give you the opportunity first if you want to write an answer I'll accept yours reflecting the time spent here. – boshek Mar 22 '18 at 21:20
  • 1
    @boshek I’m just glad you got something that works. I don’t need to type something up for points. Either accept the other answer or write up your own. Fine with me. – MrFlick Mar 22 '18 at 21:24

2 Answers2

1

We split df in chunks of 5 each then use purrr::map_dfr to apply foo function on them then bind everything together by rows

library(tidyverse)

foo <- function(var) {
  if(length(var) > 5) stop("can't be greater than 5")

  data.frame(var = var)
}

df <- 1:20
df_split <- split(df, (seq(length(df))-1) %/% 5)
df_split

map_dfr(df_split, ~ foo(.x))

   var
1    1
2    2
3    3
4    4
5    5
6    6
7    7
8    8
9    9
10  10
11  11
12  12
13  13
14  14
15  15
16  16
17  17
18  18
19  19
20  20
Tung
  • 26,371
  • 7
  • 91
  • 115
0

You can use dplyr::group_by or tapply :

data.frame(df) %>%
  mutate(grp = (row_number()-1) %/% 5) %>%
  group_by(grp) %>%
  mutate(var = foo(df)$var) %>%
  ungroup %>%
  select(var)

# # A tibble: 20 x 1
#     var
# <int>
# 1     1
# 2     2
# 3     3
# 4     4
# 5     5
# 6     6
# 7     7
# 8     8
# 9     9
# 10    10
# 11    11
# 12    12
# 13    13
# 14    14
# 15    15
# 16    16
# 17    17
# 18    18
# 19    19
# 20    20

data.frame(var=unlist(tapply(df,(df-1) %/% 5,foo)))
#    var
# 01   1
# 02   2
# 03   3
# 04   4
# 05   5
# 11   6
# 12   7
# 13   8
# 14   9
# 15  10
# 21  11
# 22  12
# 23  13
# 24  14
# 25  15
# 31  16
# 32  17
# 33  18
# 34  19
# 35  20
moodymudskipper
  • 46,417
  • 11
  • 121
  • 167