-1

I have a data.frame which consists of 12 columns. None of them are categorical variables. However, I want to create one categorical variable out of the first columns (prices in my case) and the split up the data into these 4 categories.

I started out by transforming the csvfile into a data.frame

Next I started ordering the numbers after that one column (I want the categorical variable to be split into "low", "Medium", "high", "very high")

new <- old[order(old$price),]

Next I used the function Cut to cut this one column into 4 intervals.

prices.new <- cut(new$price, breaks=4, labels=c("low","medium","high","very high"))

Now I want to replace the column old$price by prices.new.

new1 <- new[replace(new$price, prices.new)]

However, it always tells me that the value is missing.

I also see a problem because I don't know if the other values will still be comparable after that. ( I want to compare these intervals then with each other with the ANOVA afterwards)

Heike
  • 107
  • 2
  • 11
  • What have you tried? Can you provide a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example)? – bouncyball Jun 04 '16 at 20:12
  • One thing to note is that all elements of matrices always have the same type. For example, if you try to coerce a column of a matrix into a character, you will coerce the entire matrix into a character matrix. To do what I think you are trying to do, you want to be working with a data.frame. – lmo Jun 04 '16 at 20:23

1 Answers1

1

dplyr has a nice function called ntile() that can help with this. For example, if you have a data.frame called myData:

library(dplyr)

price<-runif(20,0,100)
data1<-rnorm(20)
data2<-rpois(20,2)

myData<-data.frame(price, data1, data2)

myData$price.bin<-ntile(myData$price, 4)
## because you are looking for 4 bins.

myData$price.bin<-sapply(myData$price.bin, function(x) 
                                           if (x == 1) "low"
                                      else if (x == 2) "medium"
                                      else if (x == 3) "high"
                                      else if (x == 4) "very high")

Should do the trick. Note, there are probably more efficient ways to do this, but I figured this would be readable and get the point across. The key is the ntile() function in the dplyr package.

Bryan Goggin
  • 2,449
  • 15
  • 17