How to create an inclusive binning function in R?

Question

I'm trying to create a function that bins data based upon multiple conditions. My data has two variables: max_dist and activated.

The function should create multiple vectors for the different bins; check whether the max_dist falls within a specific range and then append a 1 to the vector if it falls within the range and activated is TRUE or a 0 to the list if activated is FALSE.

The key part is that for each observation, if the max_dist is greater than the specified range but activated is also TRUE, then I would like to include within that bin a 0. So some observations with high max_dist values will be binned multiple times.

Currently I have structured it like this (shortened version - full length there are 6 bins):

binning_function <- function(df) {
 #create a series of vectors corresponding to bins
  two_hundred <- c()
  four_hundred <- c()

  #iterate through dataframe to add 0 or 1 values to each vector
  for (i in 1:nrow(df)) {
    if (df$activated[i]==TRUE && df$max_dist[i]<=0.2) {
        append(two_hundred, 1)
      }
    else if (df$max_dist[i]>0.2 || df$activated[i]==FALSE) {
        append(two_hundred, 0)
      }
   }

  for (i in 1:nrow(df)) {
    if (df$activated[i]==TRUE && df$max_dist[i]>0.2 && df$max_dist[i]<=0.4) {
        append(four_hundred, 1)
      }
    else if (df$max_dist[i]>0.4 || df$activated[i]==FALSE) {
        append(four_hundred, 0)
      }
  }

return(list(two_hundred,four_hundred))

}

When I run this function on a dataframe it returns a list:

[[1]]
NULL

[[2]]
NULL

EJBailey, please try opening a completely fresh R session (such as what I have here), load this code, and then realize that there is no data on which to work. In addition to some sample data (*please* read the links below), it also helps to know your expected output given that data. Good refs: https://stackoverflow.com/questions/5963269/, https://stackoverflow.com/help/mcve, and https://stackoverflow.com/tags/r/info. — r2evans, Aug 29 '18 at 05:27

score 0 · Answer 1 · answered Aug 29 '18 at 11:34

The solution below uses apply() to perform an action on a whole data frame at once. This also means you don't have to initiate an empty vector in advance. It also uses ifelse() to make long if() {} else {} statements shorter:

data <- data.frame(row.names = paste0('s',1:100))
 data$max_dist <- runif(100,0,1)
 data$activated <- sample(c(T,F),100,replace=T)

 binning_function <- function(df) {
  two_hundred <- apply(df,1,function(x) {ifelse(x['max_dist']<=0.2 & x['activated'],1,0)})
  four_hundred <- apply(df,1,function(x) {ifelse(x['max_dist']<=0.4 & x['max_dist']>0.2 & x['activated'],1,0)})
  return(list(two_hundred, four_hundred))
}

 binning_function(df=data)

How to create an inclusive binning function in R?

1 Answers1