1

Is there an "apply" type method that allows us to iterate through a data.frame and process the rows in exactly the same way as if we were looping? When I do apply(df, 1, function(row){...}) the row passed to the function function is NOT an actual data.frame row.

df = data.frame(A=rnorm(3), B=letters[1:3])

for (i in 1:3)
{
  row = df[i,]
  print(row)
  print(class(row))
  print(typeof(row))
  print(row$A)
  print(row$B)
}

apply(df, 1, function(row)
{
  print(row)
  print(class(row))
  print(typeof(row))
  print(row$A)
  print(row$B)
})
> df = data.frame(A=rnorm(3), B=letters[1:3])
> 
> for (i in 1:3)
+ {
+     row = df[i,]
+     print(row)
+     print(class(row))
+     print(typeof(row))
+     print(row$A)
+     print(row$B)
+ }
          A B
1 0.4179416 a
[1] "data.frame"
[1] "list"
[1] 0.4179416
[1] a
Levels: a b c
        A B
2 1.35868 b
[1] "data.frame"
[1] "list"
[1] 1.35868
[1] b
Levels: a b c
           A B
3 -0.1027877 c
[1] "data.frame"
[1] "list"
[1] -0.1027877
[1] c
Levels: a b c
> 
> apply(df, 1, function(row)
+ {
+     print(row)
+     print(class(row))
+     print(typeof(row))
+     print(row$A)
+     print(row$B)
+ })
           A            B 
" 0.4179416"          "a" 
[1] "character"
[1] "character"
 Show Traceback

 Rerun with Debug
 Error in row$A : $ operator is invalid for atomic vectors 

Edit 1

A comment to this answer says that apply turns the data.frame into a matrix so you end up getting vectors. I guess that's the problem. Maybe time for a dedicated data.frame iterator?

Edit 2

As @thelatemail pointed it this may really be a duplicate of For each row in an R dataframe.

halfer
  • 19,824
  • 17
  • 99
  • 186
abalter
  • 9,663
  • 17
  • 90
  • 145
  • I don't think there's a way to avoid it using `apply` - you could `lapply` over `seq_len(nrow(df))` though if you strictly want to avoid using `for` – thelatemail Jun 06 '19 at 22:18
  • To be clear this is documented in `?apply` - "*If X is not an array but an object of a class with a non-null dim value (such as a data frame), apply attempts to coerce it to an array via as.matrix if it is two-dimensional (e.g., a data frame) or via as.array.*" – thelatemail Jun 06 '19 at 22:22
  • I hear ya. but I still want to know if there is a way to do it without a loop. – abalter Jun 06 '19 at 23:05
  • 4
    ...and that was my first comment - `lapply` over the row indexes `lapply(seq_len(nrow(df)), function(x) df[x,] )` or use `by` as per the question you linked - https://stackoverflow.com/a/1699296/496803 – thelatemail Jun 06 '19 at 23:07
  • 1
    Pleasre try to refrain from adding remarks on how people may vote here, either relating to downvotes, potential duplicates or whether the question is on-topic. Readers will vote how they will. In relation to potential duplicates, these are offered in good faith and a spirit of helpfulness - just deal with them if they arrive. No question author can possibly guarantee that Stack Overflow does not already have a possible duplicate for their question. – halfer Jun 09 '19 at 19:23

1 Answers1

1

Other than lapply over row indexes, you can also use lapply with split. Note that I'm assigning the result to prevent the output list printing.

df = data.frame(A=rnorm(3), B=letters[1:3])

row_fun <- function(row) {
  print(row)
  print(class(row))
  print(typeof(row))
  print(row$A)
  print(row$B)
}

test <- lapply(split(df, 1:nrow(df)), row_fun)
#>            A B
#> 1 -0.1566198 a
#> [1] "data.frame"
#> [1] "list"
#> [1] -0.1566198
#> [1] a
#> Levels: a b c
#>            A B
#> 2 -0.2241851 b
#> [1] "data.frame"
#> [1] "list"
#> [1] -0.2241851
#> [1] b
#> Levels: a b c
#>           A B
#> 3 -1.028928 c
#> [1] "data.frame"
#> [1] "list"
#> [1] -1.028928
#> [1] c
#> Levels: a b c

The latest version of dplyr also provides group_map, which can be adapted to expose rows as a one-row data frame to a function using the pronoun .x (instead of as a vector, which you could already do with purrr::pmap. We just have to create a rowid variable with which to group on. Note that this coerces plain data.frames to tbl_df as well.

library(tidyverse)
test2 <- df %>%
  rowid_to_column() %>%
  group_by(rowid) %>%
  group_map(~ row_fun(.x))
#> # A tibble: 1 x 2
#>        A B    
#>    <dbl> <fct>
#> 1 -0.157 a    
#> [1] "tbl_df"     "tbl"        "data.frame"
#> [1] "list"
#> [1] -0.1566198
#> [1] a
#> Levels: a b c
#> # A tibble: 1 x 2
#>        A B    
#>    <dbl> <fct>
#> 1 -0.224 b    
#> [1] "tbl_df"     "tbl"        "data.frame"
#> [1] "list"
#> [1] -0.2241851
#> [1] b
#> Levels: a b c
#> # A tibble: 1 x 2
#>       A B    
#>   <dbl> <fct>
#> 1 -1.03 c    
#> [1] "tbl_df"     "tbl"        "data.frame"
#> [1] "list"
#> [1] -1.028928
#> [1] c
#> Levels: a b c

Created on 2019-06-06 by the reprex package (v0.3.0)

Calum You
  • 14,687
  • 4
  • 23
  • 42
  • 2
    `lapply` + `split` is essentially `by` too, which then makes this very close to a duplicate of the earlier questions which proposed the same solution. – thelatemail Jun 06 '19 at 23:32
  • @Calum -- wish I could give you an extra +1 for teaching me about `reprex`! – abalter Jun 08 '19 at 19:02
  • @thelatemail -- I see your point. We'll see if someone flags it ;) Heck, maybe I'll flag it myself. – abalter Jun 08 '19 at 19:04