I have a variant of a problem that Reorder levels of a factor without changing order of values does not answer:
A variable in a dataset has mixed numbers and strings (I know that this is undesirable, but it's there), like 4 8 16 64 128 default
. When building the initial factor, the levels are kept in order (as found, which is sorted).
However when I build subsets (requiring to clean up stale levels), the levels are sorted as strings, like 128 16 4 64 8
, even if the subset only contains numeric levels. This is bad when doing a boxplot(var ~ factor)
.
Trying to use the solutions found in the question cited above (factor(var, levels=sort(var)
), the levels ended with duplicates.
Most similar answers assume the levels are known, which is not true in my case. How can I sort the factor so that the levels are sorted.
Example:
> a<-c(1,3,5,7,2)
> b<-c(4,8,16,32,"default")
> df<-data.frame(a, b)
> df$b<-factor(df$b)
> str(df)
'data.frame': 5 obs. of 2 variables:
$ a: num 1 3 5 7 2
$ b: Factor w/ 5 levels "16","32","4",..: 3 4 1 2 5
> ss<-subset(df, b != "default")
> factor(ss$b)
[1] 4 8 16 32
Levels: 16 32 4 8
> factor(ss$b,levels=sort(ss$b))
[1] 4 8 16 32
Levels: 16 32 4 8
ss$b<-factor(ss$b,levels=sort(ss$b))
boxplot(ss$a ~ ss$b)