I have a factor variable which is composed of two substrings separated by a _
, like string1_string2
. I want to set factor levels of the prefix ("string1") and suffix ("string2") separately, and then define an overall set of factor levels for the concatenated string. In addition, the precedence of levels in the first vs the second substring may vary.
A small example of what I want to achieve:
# reproducible data
x <- factor(c("DBO_A", "PH_A", "COND_A", "DBO_B", "PH_B", "COND_B", "DBO_C", "PH_C", "COND_C"))
[1] DBO_A PH_A COND_A DBO_B PH_B COND_B DBO_C PH_C COND_C
Levels: COND_A COND_B COND_C DBO_A DBO_B DBO_C PH_A PH_B PH_C
If I don't define the factor levels, they will be ordered alphabetically. Now I want to set the levels of the strings on the left and right side of the _
separator, e.g.
PH
<COND
<DBO
on the left side (LHS).B
<A
<C
on the right side (RHS).
In addition, I want to specify which side, LHS or RHS, has precedence over the other. Depending on which side has precedence, the overall order of levels will differ:
(1) If levels on LHS is precedent:
[1] DBO_A PH_A COND_A DBO_B PH_B COND_B DBO_C PH_C COND_C
Levels: PH_B PH_A PH_C COND_B COND_A COND_C DBO_B DBO_A DBO_C
(2) If levels on RHS is precedent:
[1] DBO_A PH_A COND_A DBO_B PH_B COND_B DBO_C PH_C COND_C
Levels: PH_B COND_B DBO_B PH_A COND_A DBO_A PH_C COND_C DBO_C
Now I just one thought to solve it such like factor(x, levels = c(xx, xx, ...))
, but I have more levels than the above shows, so this will look ridiculous.
Note: I don't want to change the order of my data, only the order of the levels.