Create a new column based on values from other variables

Question

I have data that looks like this:

A set of 10 character variables

Char<-c("A","B","C","D","E","F","G","H","I","J")

And a data frame that looks like this

Col1<-seq(1:25)
Col2<-c(1,1,1,1,1,2,2,2,2,2,3,3,3,3,3,4,4,4,4,4,5,5,5,5,5)
DF<-data.frame(Col1,Col2)

What I would like to do is to add a third column to the data frame, with the logic that 1=A, 2=B, 3= C and so on. So the end result would be

Col3<-c("A","A","A","A","A","B","B","B","B","B","C","C","C","C","C","D","D","D","D","D","E","E","E","E","E")
DF<-data.frame(Col1,Col2,Col3)

For this simple example I could go with a simple substitution like this question: Create new column based on 4 values in another column

But my actual data set is much bigger with a lot more variables than this simple example, so writing out the equivalents as in the above answer is not a possibility.

So I would like to have a bit of code that can be applied to a much larger data frame. Perhaps something that looped through all the values of Col2 and matched them to the location of Char.

1=Char[1]  2=Char[2] 3=Char[3]...... for the entire length of Col2

Or any other way that could scale up to a long monstrous data frame

`Char[Col2]` gives the output in your example. Is that all you need? — Pierre L, Sep 23 '15 at 21:52
I like the simplicity of this. It works well for this example, but in my data set Col2 is not a simple series of numbers. But if I could take my my actual data, and turn into a series of numbers like above (maybe using unique?) then this approach would be perfect. — Vinterwoo, Sep 23 '15 at 22:16
If your lookup codes vary, you can name the `Char` vector and it will act as a lookup table. `names(Char) <- codes`. Then you can use `Char[Col2]` and it will subset on the names not the index. — Pierre L, Sep 23 '15 at 22:29

Jared Gossett · Answer 1 · 2015-09-23T23:36:23.347

5

# Values that Col2 might have taken
levels = c(1, 2, 3, 4, 5)

# Labels for the levels in same order as levels
labels = c('A', 'B', 'C', 'D', 'E')

DF$Col3 <- factor(DF$Col2, levels = levels, labels = labels)

edited Sep 23 '15 at 23:36

answered Sep 23 '15 at 23:19

Jared Gossett

81
1
1
5

score 3 · Answer 2 · answered Sep 23 '15 at 21:47

If you wanted to use each column as an index into some vector (I'll use letters so I can index up to 25), returning a data frame of the same dimension of DF, you could use:

transformed <- as.data.frame(lapply(DF, function(x) letters[x]))
head(transformed)
#   Col1 Col2
# 1    a    a
# 2    b    a
# 3    c    a
# 4    d    a
# 5    e    a
# 6    f    b

You could then combine this with your original data frame with cbind(DF, transformed).

score 3 · Accepted Answer · answered Sep 23 '15 at 22:14

3

I know it may be taboo to use for loops in R, but I tried this out and it worked well.

for (i in length(DF$Col2)) {
    DF$Col3[i] <- Char[DF$Col2[i]]
}

Would that be sufficient? I think you could also unique(DF$Col2) or levels(factor(DF$Col2))

Perhaps though I'm misunderstanding your question.

answered Sep 23 '15 at 22:14

asshah4

164
10

1

I like for loops as I find them a little more intuitive myself – Vinterwoo Sep 24 '15 at 00:17

score 3 · Answer 4 · answered Sep 23 '15 at 22:50

3

Why not make a key and join?

library(dplyr)

letter_key = data_frame(letter__ID = 1:26,
                        letter = letters)

DF %>%
  rename(letter__ID = Col2) %>%
  left_join(letter_key)

This kind of thing can also be done with factors

answered Sep 23 '15 at 22:50

bramtayl

4,004
2
11
18

Create a new column based on values from other variables

4 Answers4