Proper way to loop over the length of a dataframe in R

Question

After quite a bit of debugging today, to my dismay i found that:

for (i in 1:0) {
     print(i)
}

Actually prints 1 and 0 respectively in R. The problem came up when writing

for (i in 1:nrow(myframe) {
     fn(i)
}

Which i had intended to not execute at all if nrow(myframe)==0. Is the proper correction just:

if (nrow(myvect) != 0) {
    for (i in 1:nrow(myframe) {
        fn(i)
    }
}

Or is there a more proper way to do what I wanted in R?

talat · Accepted Answer · 2014-07-23T18:03:13.603

You can use seq_along instead:

vec <- numeric() 
length(vec)
#[1] 0

for(i in seq_along(vec)) print(i)   # doesn't print anything

vec <- 1:5

for(i in seq_along(vec)) print(i)
#[1] 1
#[1] 2
#[1] 3
#[1] 4
#[1] 5

Edit after OP update

df <- data.frame(a = numeric(), b = numeric())
> df
#[1] a b
#<0 rows> (or row.names with length 0)

for(i in seq_len(nrow(df))) print(i)    # doesn't print anything

df <- data.frame(a = 1:3, b = 5:7)

for(i in seq_len(nrow(df))) print(i)
#[1] 1
#[1] 2
#[1] 3

score 4 · Answer 2 · answered Jul 23 '14 at 18:04

Regarding the edit, see the counterpart function seq_len(NROW(myframe)). This usage is exactly why you don't use 1:N in a for() loop, incase whatever value ends up replacing N is 0 or negative.

An alternative (which just hides the loop) is to do apply(myframe, 1, FUN = foo) where foo is a function containing the things you want to do to each row of myframe and will probably just be cut and paste from the body of the loop.

score 4 · Answer 3 · answered Jul 23 '14 at 18:05

4

For vectors there is seq_along, for DataFrames you may use seq_len

for(i in seq_len(nrow(the.table)){
    do.stuff()
}

answered Jul 23 '14 at 18:05

Boris Gorelik

29,945
39
128
170

score 3 · Answer 4 · answered Sep 16 '17 at 16:25

Clearly all previous answers do the job.

I like to have something like this:

rows_along <- function(df) seq(nrow(df))

and then

for(i in rows_along(df)) # do stuff

Totally idiosyncratic answer, it is just a wrapper. But I think it is more readable/intuitive.

score 1 · Answer 5 · edited May 23 '17 at 12:24

1

I think the most proper way in R is to use an apply function. More often than not, there's an apply function that does that. And more often than not, you don't need a sequence.

Here's an example that applies diff to each column, or each row.

> d <- data.frame(x = 1:5, y = 6:10)

over the columns,

> lapply(d, diff)
$x
[1] 1 1 1 1

$y
[1] 1 1 1 1

across the rows,

> apply(d, 1, diff)
[1] 5 5 5 5 5

over the columns again, returning a matrix

> sapply(d, diff)
     x y
[1,] 1 1
[2,] 1 1
[3,] 1 1
[4,] 1 1

See this link for a most excellent explanation about apply

edited May 23 '17 at 12:24

Community

1
1

answered Jul 23 '14 at 18:09

Rich Scriven

97,041
11
181
245

"I think the most proper way in R is to use an apply function" -- with all due respect, I don't think this is good advice. It's OK for there to be two or more ways to do something, but the "wrong way" can't be the most obvious and usually workable way, with the "right way" trailing somewhere behind it; that's just messing with people's heads. For what it's worth. – Robert Dodier Apr 30 '20 at 21:43

Proper way to loop over the length of a dataframe in R

5 Answers5

Edit after OP update

Linked

Related