How to iterate (functionally) through rows of a data.frame in R and process as if looping?

Question

Is there an "apply" type method that allows us to iterate through a data.frame and process the rows in exactly the same way as if we were looping? When I do apply(df, 1, function(row){...}) the row passed to the function function is NOT an actual data.frame row.

df = data.frame(A=rnorm(3), B=letters[1:3])

for (i in 1:3)
{
  row = df[i,]
  print(row)
  print(class(row))
  print(typeof(row))
  print(row$A)
  print(row$B)
}

apply(df, 1, function(row)
{
  print(row)
  print(class(row))
  print(typeof(row))
  print(row$A)
  print(row$B)
})

> df = data.frame(A=rnorm(3), B=letters[1:3])
> 
> for (i in 1:3)
+ {
+     row = df[i,]
+     print(row)
+     print(class(row))
+     print(typeof(row))
+     print(row$A)
+     print(row$B)
+ }
          A B
1 0.4179416 a
[1] "data.frame"
[1] "list"
[1] 0.4179416
[1] a
Levels: a b c
        A B
2 1.35868 b
[1] "data.frame"
[1] "list"
[1] 1.35868
[1] b
Levels: a b c
           A B
3 -0.1027877 c
[1] "data.frame"
[1] "list"
[1] -0.1027877
[1] c
Levels: a b c
> 
> apply(df, 1, function(row)
+ {
+     print(row)
+     print(class(row))
+     print(typeof(row))
+     print(row$A)
+     print(row$B)
+ })
           A            B 
" 0.4179416"          "a" 
[1] "character"
[1] "character"
 Show Traceback

 Rerun with Debug
 Error in row$A : $ operator is invalid for atomic vectors

Edit 1

A comment to this answer says that apply turns the data.frame into a matrix so you end up getting vectors. I guess that's the problem. Maybe time for a dedicated data.frame iterator?

Edit 2

As @thelatemail pointed it this may really be a duplicate of For each row in an R dataframe.

I don't think there's a way to avoid it using `apply` - you could `lapply` over `seq_len(nrow(df))` though if you strictly want to avoid using `for` — thelatemail, Jun 06 '19 at 22:18
To be clear this is documented in `?apply` - "*If X is not an array but an object of a class with a non-null dim value (such as a data frame), apply attempts to coerce it to an array via as.matrix if it is two-dimensional (e.g., a data frame) or via as.array.*" — thelatemail, Jun 06 '19 at 22:22
I hear ya. but I still want to know if there is a way to do it without a loop. — abalter, Jun 06 '19 at 23:05
...and that was my first comment - `lapply` over the row indexes `lapply(seq_len(nrow(df)), function(x) df[x,] )` or use `by` as per the question you linked - https://stackoverflow.com/a/1699296/496803 — thelatemail, Jun 06 '19 at 23:07
Pleasre try to refrain from adding remarks on how people may vote here, either relating to downvotes, potential duplicates or whether the question is on-topic. Readers will vote how they will. In relation to potential duplicates, these are offered in good faith and a spirit of helpfulness - just deal with them if they arrive. No question author can possibly guarantee that Stack Overflow does not already have a possible duplicate for their question. — halfer, Jun 09 '19 at 19:23

score 1 · Accepted Answer · answered Jun 06 '19 at 23:15

Other than lapply over row indexes, you can also use lapply with split. Note that I'm assigning the result to prevent the output list printing.

df = data.frame(A=rnorm(3), B=letters[1:3])

row_fun <- function(row) {
  print(row)
  print(class(row))
  print(typeof(row))
  print(row$A)
  print(row$B)
}

test <- lapply(split(df, 1:nrow(df)), row_fun)
#>            A B
#> 1 -0.1566198 a
#> [1] "data.frame"
#> [1] "list"
#> [1] -0.1566198
#> [1] a
#> Levels: a b c
#>            A B
#> 2 -0.2241851 b
#> [1] "data.frame"
#> [1] "list"
#> [1] -0.2241851
#> [1] b
#> Levels: a b c
#>           A B
#> 3 -1.028928 c
#> [1] "data.frame"
#> [1] "list"
#> [1] -1.028928
#> [1] c
#> Levels: a b c

The latest version of dplyr also provides group_map, which can be adapted to expose rows as a one-row data frame to a function using the pronoun .x (instead of as a vector, which you could already do with purrr::pmap. We just have to create a rowid variable with which to group on. Note that this coerces plain data.frames to tbl_df as well.

library(tidyverse)
test2 <- df %>%
  rowid_to_column() %>%
  group_by(rowid) %>%
  group_map(~ row_fun(.x))
#> # A tibble: 1 x 2
#>        A B    
#>    <dbl> <fct>
#> 1 -0.157 a    
#> [1] "tbl_df"     "tbl"        "data.frame"
#> [1] "list"
#> [1] -0.1566198
#> [1] a
#> Levels: a b c
#> # A tibble: 1 x 2
#>        A B    
#>    <dbl> <fct>
#> 1 -0.224 b    
#> [1] "tbl_df"     "tbl"        "data.frame"
#> [1] "list"
#> [1] -0.2241851
#> [1] b
#> Levels: a b c
#> # A tibble: 1 x 2
#>       A B    
#>   <dbl> <fct>
#> 1 -1.03 c    
#> [1] "tbl_df"     "tbl"        "data.frame"
#> [1] "list"
#> [1] -1.028928
#> [1] c
#> Levels: a b c

^{Created on 2019-06-06 by the reprex package (v0.3.0)}

`lapply` + `split` is essentially `by` too, which then makes this very close to a duplicate of the earlier questions which proposed the same solution. — thelatemail, Jun 06 '19 at 23:32
@Calum -- wish I could give you an extra +1 for teaching me about `reprex`! — abalter, Jun 08 '19 at 19:02
@thelatemail -- I see your point. We'll see if someone flags it ;) Heck, maybe I'll flag it myself. — abalter, Jun 08 '19 at 19:04

How to iterate (functionally) through rows of a data.frame in R and process as if looping?

Edit 1

Edit 2

1 Answers1