0

I am trying to create a bunch of columns that are quartile cuts based on multiple columns. For example,

dataset[,412:422] <- NA

for( i in 50:60){
for(j in 412:422){
     dataset[,j] <- cut(dataset[,i], 
                                      breaks=unique(quantile(dataset[,i],probs=seq(.1,1,by=.1),na.rm=T)), 
                                      include.lowest=TRUE)
    } 
}

I want to create new columns 412 to 422 based on the binning of the continuous variables from columns 50 to 60. When I try to do the above code all I get back is

   V412    V413    V414    V415    V416    V417    V418    V419 V420    V421    V422
(56,64] (56,64] (56,64] (56,64] (56,64] (56,64] (56,64] (56,64] (56,64] (56,64] (56,64]
 <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>  <NA>    <NA>    <NA>


......

<NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA> <NA>    <NA>    <NA>

I am not sure where I am going wrong. Any help would be greatly appreciated!!!

  • 2
    Are you sure want a double loop here? You'll be assigning new values to the column `dataset[,j]` 11 different times. You really should also include a *minimal* [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input data so we can run your code and test possible solutions. – MrFlick Jun 11 '15 at 16:29
  • I am a novice, so bare with me. I was thinking a double loop so that I could add columns to the existing data set. – Jose Melendez Jun 11 '15 at 16:35
  • You don't need a double loop for that. Again, provide a reproducible example and we can probably show you a better way. – MrFlick Jun 11 '15 at 16:39
  • The data set I am working with has over 19k rows, and over 400 variables. I think that would be too much to try and place here. – Jose Melendez Jun 11 '15 at 16:50

1 Answers1

1

This question is more about being organized and neat with your data. There are many ways to do this.

I would recommend separating out the data you want to bin into its own data.frame.

x=dataset[, 50:60]

then bin those columns into new columns by making a function with the parameters you want and using apply

function:

mycut=function(x)  cut(x, 
                       breaks=unique(quantile(x,probs=seq(.1,1,by=.1),na.rm=T)), 
                       include.lowest=TRUE)

apply:

xbin=apply(x,2,mycut)

Then putting xbin back into your dataset and appropriately name it.

Seth
  • 4,745
  • 23
  • 27