4

I have data that looks like this:

A set of 10 character variables

Char<-c("A","B","C","D","E","F","G","H","I","J")

And a data frame that looks like this

Col1<-seq(1:25)
Col2<-c(1,1,1,1,1,2,2,2,2,2,3,3,3,3,3,4,4,4,4,4,5,5,5,5,5)
DF<-data.frame(Col1,Col2)

What I would like to do is to add a third column to the data frame, with the logic that 1=A, 2=B, 3= C and so on. So the end result would be

Col3<-c("A","A","A","A","A","B","B","B","B","B","C","C","C","C","C","D","D","D","D","D","E","E","E","E","E")
DF<-data.frame(Col1,Col2,Col3)

For this simple example I could go with a simple substitution like this question: Create new column based on 4 values in another column

But my actual data set is much bigger with a lot more variables than this simple example, so writing out the equivalents as in the above answer is not a possibility.

So I would like to have a bit of code that can be applied to a much larger data frame. Perhaps something that looped through all the values of Col2 and matched them to the location of Char.

1=Char[1]  2=Char[2] 3=Char[3]...... for the entire length of Col2

Or any other way that could scale up to a long monstrous data frame

Community
  • 1
  • 1
Vinterwoo
  • 3,843
  • 6
  • 36
  • 55
  • 3
    `Char[Col2]` gives the output in your example. Is that all you need? – Pierre L Sep 23 '15 at 21:52
  • 2
    I like the simplicity of this. It works well for this example, but in my data set Col2 is not a simple series of numbers. But if I could take my my actual data, and turn into a series of numbers like above (maybe using unique?) then this approach would be perfect. – Vinterwoo Sep 23 '15 at 22:16
  • 2
    If your lookup codes vary, you can name the `Char` vector and it will act as a lookup table. `names(Char) <- codes`. Then you can use `Char[Col2]` and it will subset on the names not the index. – Pierre L Sep 23 '15 at 22:29

4 Answers4

5
# Values that Col2 might have taken
levels = c(1, 2, 3, 4, 5)

# Labels for the levels in same order as levels
labels = c('A', 'B', 'C', 'D', 'E')

DF$Col3 <- factor(DF$Col2, levels = levels, labels = labels)
Jared Gossett
  • 81
  • 1
  • 1
  • 5
3

If you wanted to use each column as an index into some vector (I'll use letters so I can index up to 25), returning a data frame of the same dimension of DF, you could use:

transformed <- as.data.frame(lapply(DF, function(x) letters[x]))
head(transformed)
#   Col1 Col2
# 1    a    a
# 2    b    a
# 3    c    a
# 4    d    a
# 5    e    a
# 6    f    b

You could then combine this with your original data frame with cbind(DF, transformed).

josliber
  • 43,891
  • 12
  • 98
  • 133
3

I know it may be taboo to use for loops in R, but I tried this out and it worked well.

for (i in length(DF$Col2)) {
    DF$Col3[i] <- Char[DF$Col2[i]]
}

Would that be sufficient? I think you could also unique(DF$Col2) or levels(factor(DF$Col2))

Perhaps though I'm misunderstanding your question.

asshah4
  • 164
  • 10
3

Why not make a key and join?

library(dplyr)

letter_key = data_frame(letter__ID = 1:26,
                        letter = letters)

DF %>%
  rename(letter__ID = Col2) %>%
  left_join(letter_key)

This kind of thing can also be done with factors

bramtayl
  • 4,004
  • 2
  • 11
  • 18