0

I'd like to sort a categorical variable my own way. I have grouped my dataset into categories like "1-5","6-10","11-20" .... ">251" and so forth. If plot the variables or display them in a table the sequence of the legend respectively the sequence in the label is "messed up".

This is not surprising since R does not know that these unordered variables are ordered in fact. Is there a way to attach a manually defined sequence to them?

thx for any suggestions in advance!

Matt Bannert
  • 27,631
  • 38
  • 141
  • 207
  • 2
    Please can you provide some sample (minimal, reproducible) code, so that we can see exactly what you are trying to do. – Richie Cotton Jul 20 '10 at 12:20
  • I agree, the answer to your question was already given when you phrased it a little differently earlier. The answer is still the same, cut(). – John Jul 20 '10 at 13:45
  • I don't think cut() is the answer to the more general question of reordering factors. My 2c below. – Eduardo Leoni Jul 20 '10 at 18:13

3 Answers3

4

Categorical variables are stored as (or converted to be) factors when you plot them. The order they appear in the plot depends upon the levels of the factor.

You likely want to use cut to create your groups. e.g.

dfr <- data.frame(x = runif(100, 1, 256))
dfr$groups <- cut(dfr$x, seq(1, 256, 5))

This problem is also very similar to another recent SO question.

Community
  • 1
  • 1
Richie Cotton
  • 118,240
  • 47
  • 247
  • 360
  • 2
    In fact, this advice seems to be just what rcs told you for your previous question http://stackoverflow.com/questions/3288361/create-size-categories-without-nested-ifelse-in-r – Richie Cotton Jul 20 '10 at 12:26
  • yup, you´re right. obviously I need to take a closer look to that cut function. Just used ifelse so far... thx for pointing it out. Besides, I am still interested in re-arranging the order once I set it the wrong way... – Matt Bannert Jul 20 '10 at 12:58
  • cut() will allow you to set order. If you need further control the factor() allows you to set order as well. – John Jul 20 '10 at 13:47
4

When I want to specify a different order for a factor manually (tedious, but sometimes necessary) here is what I do:

> ## a factor
> x <- factor(letters[1:3])
> ## write out levels with dput
> dput(levels(x))
c("a", "b", "c")
> ## copy, paste, modify and use factor again. e.g.
> x <- factor(x, levels=c("b", "a", "c"))
> x
[1] a b c
Levels: b a c
Eduardo Leoni
  • 8,991
  • 6
  • 42
  • 49
-1

I like using split for that sort of thing.

vect = runif(10)

vect.categories = c(rep(LETTERS[1],5),rep(LETTERS[2],3),rep(LETTERS[5],2))

category.list =split(vect,vect.categories)

....

May not be related, but thought I'd offer the suggestion.

Community
  • 1
  • 1