Bug in my for-loop to iterate over data frame

Question

I am working on a data frame and have extracted on the of the columns with hour data from 0 t0 23. I am adding one more column as type of the day based on hour. I had executed below for loop but getting error. Can somebody help me what is wrong with below syntax and how to correct the same.

for(i in data$Requesthours) {
   if(data$Requesthours>=0 & data$Requesthours<3) {
     data$Partoftheday <- "Midnight"
   } else if(data$Requesthours>=3 & data$Requesthours<6) {
     data$Partoftheday <- "Early Morning"
   } else if(data$Requesthours>=6 & data$Requesthours<12) {
     data$Partoftheday <- "Morning"
   } else if(data$Requesthours>=12 & data$Requesthours<16) {
     data$Partoftheday <- "Afternoon"
   } else if(data$Requesthours>=16 & data$Requesthours<20) {
     data$Partoftheday <- "Evening"
   } else if(data$Requesthours>=20 & data$Requesthours<=23) {
     data$Partoftheday <- "Night"
   }
}

a) "kept getting error" tells us nothing; please post the text of the error. Include the line of code it references so we can see was it caused by the for-statement or the if-statement? b) Also we don't have your data, please add a Minimal Reproducible Example **[How to make a great R reproducible example?](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610#5963610)**. c) Also please format and indent your code so it's legible; I did that for you this time. — smci, Jun 05 '17 at 23:14
Also, see my answer for a coding tip which would reduce this to a one-liner (and bypass your bug) — smci, Jun 05 '17 at 23:27
Still waiting for you to post your bug. I suspected you have NAs in your data$Requesthours column. But no actually it's... — smci, Jun 05 '17 at 23:31
Your bug is `for(i in data$Requesthours)` is trying to iterate over your dataframe, but confusing row-indices with data values. — smci, Jun 05 '17 at 23:47

smci · Answer 1 · 2017-06-06T00:15:39.853

Still waiting for you to post your bug, but here's an R coding tip which will reduce this to a one-liner (and bypass your bug). Also it'll be way faster (it's vectorized, unlike your for-loop and if-else-ladder).

data$Partoftheday <- as.character(
  cut(data$Requesthours,
      breaks=c(-1,3,6,12,16,20,24),
      labels=c('Midnight', 'Early Morning', 'Morning', 'Afternoon', 'Evening', 'Night')
  )
)
# see Notes on cut() at bottom to explain this

Now back to your bug: You're confused about how to iterate over a column in R. for(i in data$Requesthours) is trying to iterate over your df, but you're confusing indices with data values. Also you try to make i an iterator, but then you don't refer to the value i anywhere inside the loop, you refer back to data$Requesthours, which is an entire column not a single value (how do the loop contents known which value you're referring to? They don't. You could use an ugly explicit index-loop like for (i in 1:nrow(data) ... or for (i in seq_along(data) ... then access data[i,]$Requesthours, but please don't. Because...

One of the huge idiomatic things about learning R is generally when you write a for-loop to iterate over a dataframe or a df column, you should stop to think (or research) if there isn't a vectorized function in R that does what you want. cut, if, sum, mean, max, diff, stdev, ... fns are all vectorized, as are all the arithmetic and logical operators. 'vectorized' means you can feed them an entire (column) vector as an input, and they produce an entire (column) vector as output which you can directly assign to your new column. Very simple, very fast, very powerful. Generally beats the pants off for-loops. Please read R-intro.html, esp. Section 2 about vector assignment

And if you can't find or write a vectorized fn, there's also the *apply family of functions apply, sapply, lapply, ... to apply any arbitrary function you want to a list/vector/dataframe/df column.

Notes on cut()

cut(data, breaks, labels, ...) is a function where data is your input vector (e.g. your selected column data$Requesthours), breaks is a vector of integer or numeric, and labels is a vector to name the output. The length of labels is one more than breaks, since 5 breaks divides your data into 6 ranges.
We want the output vector to be string, not categorical, hence we apply as.character() to the output from cut()
Since your first if-else comparison is (hr>=0 & hr<3), we have to fiddle the lowest cutoff_hour 0 to -1, otherwise hr==0 would wrongly give NA. (There is a parameter include.lowest=TRUE/FALSE but it's not what you want, because it would also cause hr==3 to be 'Midnight', hr==6 to be 'Early Morning', etc.)

score 0 · Answer 2 · answered Jun 05 '17 at 23:39

if(data$Requesthours>=0 & data$Requesthours<3) (and other similar ifs) make no sense since data$Requesthours is a vector. You should try either of the following:

Solution 1:

for(i in seq(length(data$Requesthours))) {
    if(data$Requesthours[i]>=0 & data$Requesthours[i]<3)
        data$Partoftheday[i] <- "Midnight"
    ....
}

This solution is slow like hell and really ugly, but it would work.

Solution 2:

data$Partoftheday[data$Requesthours>=0 & data$Requesthours<3] <- "Midnight"
...

Solution 3 = what was proposed by smci

Bug in my for-loop to iterate over data frame

2 Answers2

Notes on cut()