3

I am working with a simple data.frame object, and just want to assign a simple column to that, extracted from a character object. The thing is that the values are extracted on the wrong order.

I have a data frame that looks like this:

>df
       Sample                 ID
1         558  Benign_or_BL.FFPE
2         105  Benign_or_BL.FFPE
3          37         Early.FFPE
4          79         Early.FFPE
5         180         Early.FFPE
6         133          Late.FFPE
7         152          Late.FFPE
8         265          Late.FFPE
9         558 Benign_or_BL.Fresh
10        105 Benign_or_BL.Fresh
11        573 Benign_or_BL.Fresh
12        374 Benign_or_BL.Fresh
13        307 Benign_or_BL.Fresh
14        403 Benign_or_BL.Fresh
15         37        Early.Fresh
16         79        Early.Fresh
17        180        Early.Fresh
18        584        Early.Fresh
19        482        Early.Fresh
20        500        Early.Fresh
21        571        Early.Fresh
22        572        Early.Fresh
23        371        Early.Fresh
24        133         Late.Fresh
25        152         Late.Fresh
26        265         Late.Fresh
27         65         Late.Fresh
28        422         Late.Fresh
29        562         Late.Fresh
30        485         Late.Fresh
31        492         Late.Fresh
32        518         Late.Fresh

What I want to do is simply to assign a HEX color code to each level of the column df$ID.

My first try was creating an object containing the same amount of colors as the number of levels on df$ID. Here is what I did:

> levels(as.factor(targetsJ$ID2))
[1] "Benign_or_BL.FFPE" "Benign_or_BL.Fresh" "Early.FFPE" "Early.Fresh" "Late.FFPE"         
[6] "Late.Fresh"

Now, I create an object with the colors I want, on that exact same order:

> colors <- c("#9b9dff","#5153ff","#0003e0","#f6a5aa","#ee4c55","#c4131d")

And now, adding and extra column containing the color coding, I get something like this:

> df$colcode <- colors[as.factor(targetsJ$ID)]

> head(df, n=10)
       Sample                ID  colcode
1         558  Benign_or_BL.FFPE #9b9dff
2         105  Benign_or_BL.FFPE #9b9dff
3          37         Early.FFPE #0003e0
4          79         Early.FFPE #0003e0
5         180         Early.FFPE #0003e0
6         133          Late.FFPE #ee4c55
7         152          Late.FFPE #ee4c55
8         265          Late.FFPE #ee4c55
9         558 Benign_or_BL.Fresh #5153ff
10        105 Benign_or_BL.Fresh #5153ff

As you can see the order of the color codes is different from that on the object colors.

What I am expecting is this:

> head(df, n=10)
       Sample                ID  colcode
1         558  Benign_or_BL.FFPE #9b9dff
2         105  Benign_or_BL.FFPE #9b9dff
3          37         Early.FFPE #5153ff
4          79         Early.FFPE #5153ff
5         180         Early.FFPE #5153ff
6         133          Late.FFPE #0003e0
7         152          Late.FFPE #0003e0
8         265          Late.FFPE #0003e0
9         558 Benign_or_BL.Fresh #f6a5aa
10        105 Benign_or_BL.Fresh #f6a5aa

What is going on here? Any help is greatly appreciated.

Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
Douglas
  • 185
  • 1
  • 7

1 Answers1

2

When we use factor, the levels are assigned automatically in a sorted way. See

temp <- c("a", "c", "d", "b")
levels(factor(temp))
#[1] "a" "b" "c" "d"

If we want to assign the levels based on their occurrence in the data, we need to specify the levels using unique

levels(factor(temp, levels = unique(temp)))
#[1] "a" "c" "d" "b"

So in your case, we do

df$ID <- factor(df$ID, levels = unique(df$ID))
df$colcode <- colors[df$ID]


df
#   Sample                 ID colcode
#1     558  Benign_or_BL.FFPE #9b9dff
#2     105  Benign_or_BL.FFPE #9b9dff
#3      37         Early.FFPE #5153ff
#4      79         Early.FFPE #5153ff
#5     180         Early.FFPE #5153ff
#6     133          Late.FFPE #0003e0
#7     152          Late.FFPE #0003e0
#8     265          Late.FFPE #0003e0
#9     558 Benign_or_BL.Fresh #f6a5aa
#10    105 Benign_or_BL.Fresh #f6a5aa
#....

Similarly, we can also use match

df$colcode <- colors[match(df$ID, unique(df$ID))]
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213