Still waiting for you to post your bug, but here's an R coding tip which will reduce this to a one-liner (and bypass your bug). Also it'll be way faster (it's vectorized, unlike your for-loop and if-else-ladder).
data$Partoftheday <- as.character(
cut(data$Requesthours,
breaks=c(-1,3,6,12,16,20,24),
labels=c('Midnight', 'Early Morning', 'Morning', 'Afternoon', 'Evening', 'Night')
)
)
# see Notes on cut() at bottom to explain this
Now back to your bug: You're confused about how to iterate over a column in R. for(i in data$Requesthours)
is trying to iterate over your df, but you're confusing indices with data values. Also you try to make i
an iterator, but then you don't refer to the value i
anywhere inside the loop, you refer back to data$Requesthours
, which is an entire column not a single value (how do the loop contents known which value you're referring to? They don't. You could use an ugly explicit index-loop like for (i in 1:nrow(data) ...
or for (i in seq_along(data) ...
then access data[i,]$Requesthours
, but please don't. Because...
One of the huge idiomatic things about learning R is generally when you write a for-loop to iterate over a dataframe or a df column, you should stop to think (or research) if there isn't a vectorized function in R that does what you want. cut, if, sum, mean, max, diff, stdev, ...
fns are all vectorized, as are all the arithmetic and logical operators. 'vectorized' means you can feed them an entire (column) vector as an input, and they produce an entire (column) vector as output which you can directly assign to your new column. Very simple, very fast, very powerful. Generally beats the pants off for-loops. Please read R-intro.html, esp. Section 2 about vector assignment
And if you can't find or write a vectorized fn, there's also the *apply
family of functions apply, sapply, lapply, ...
to apply any arbitrary function you want to a list/vector/dataframe/df column.
Notes on cut()
cut(data, breaks, labels, ...)
is a function where data
is your input vector (e.g. your selected column data$Requesthours
), breaks
is a vector of integer or numeric, and labels
is a vector to name the output. The length of labels is one more than breaks, since 5 breaks divides your data into 6 ranges.
- We want the output vector to be string, not categorical, hence we apply
as.character()
to the output from cut()
- Since your first if-else comparison is
(hr>=0 & hr<3)
, we have to fiddle the lowest cutoff_hour 0 to -1, otherwise hr==0 would wrongly give NA. (There is a parameter include.lowest=TRUE/FALSE
but it's not what you want, because it would also cause hr==3 to be 'Midnight', hr==6 to be 'Early Morning', etc.)