1

I am looking to perform a "find/replace" of factors across a large data frame using characters from another data frame.

To explain with a simple example, I have the following data frame (df):

  ID1 ID2 value
1   A   A  0.01
2   A   D  0.02
3   B   D  0.03
4   B   C  0.04
5   C   F  0.05
6   C   D  0.06
7   D   A  0.07
8   D   C  0.08

And would like to replace each letter with a name, derived from the following data frame (id):

  ID   NAME
1  A   ADAM
2  B    BOB
3  C   CARL
4  D  DAVID
5  E EDWARD
6  F   FRED

To end up with (new.df):

    ID1   ID2 value
1  ADAM  ADAM  0.01
2  ADAM DAVID  0.02
3   BOB DAVID  0.03
4   BOB  CARL  0.04
5  CARL  FRED  0.05
6  CARL DAVID  0.06
7 DAVID  ADAM  0.07
8 DAVID  CARL  0.08

I know there are many simple options that would involve writing out all replacement options (i.e. A="ADAM"), but I need to replace 1000's of factors so this is not an option.

I'm not sure where to start! I tried out car::recode using this recode website but it only replaces in a vector.

  • 1
    `df[1:2] <- lapply(df[1:2], function(x) id$NAME[match(x, id$ID)] )` - this is very close to a duplicate of http://stackoverflow.com/questions/18456968/how-do-i-map-a-vector-of-values-to-another-vector-with-my-own-custom-map-in-r/18457055#18457055 – thelatemail Oct 22 '15 at 03:39
  • Thank you - this worked perfectly. If this is a duplicate post, I'm happy to remove it. However I did spend about two hours trying to find a related post on stackoverflow with no success. – Kelly Williams Oct 22 '15 at 03:52
  • There is no need to remove/delete this question. Having duplicates is often good, as there can be subtle variations in approach or terminology, which will benefit from linking between the questions. Even if it gets marked a duplicate, it will not disappear from search results. Hopefully this will prevent people like yourself needing to spend hours searching in the future. :-) – thelatemail Oct 22 '15 at 03:57

3 Answers3

0

You may want to try

df$ID1 <- as.character(factor(df$ID1, levels=id$ID, labels=id$NAME))
Hao
  • 7,476
  • 1
  • 38
  • 59
  • Thank you, but this is just an example data frame in my post. Since I am replacing >1000 IDs with NAMEs I would like an option where I don't have to define labels. – Kelly Williams Oct 22 '15 at 03:34
  • Try replacing `levels=toupper(letters[1:6])` by `levels=id$ID` and `labels=c("ADAM", "BOB", "CARL", "DAVID", "EDWARD", "FRED")` by `labels=id$NAME`. –  Oct 22 '15 at 03:37
0

A perhaps inefficient way to do this would be with reshaping.

library(dplyr)
library(tidyr)

key = data_frame(
  ID = c("A", "B", "C", "D", "F"),
  name = c("Adam", "Bob", "Carl", "David", "Fred") )

data %>%
  mutate(row_ID = 1:n()) %>%
  gather(variable, ID, -value, -row_ID) %>%
  left_join(key) %>%
  select(-ID) %>%
  spread(variable, name)
bramtayl
  • 4,004
  • 2
  • 11
  • 18
0

We can do this with dplyr (using the same approach as @thelatemail). We use mutate_each, select the columns 'ID1', 'ID2' in the first dataset ('df1'), match with the 'ID' from 'df2' to get the numeric index and use that to replace it with 'NAME' from 'df2'.

library(dplyr)
df1 %>%
     mutate_each(funs(df2$NAME[match(., df2$ID)]), ID1:ID2)
#    ID1   ID2 value
#1  ADAM  ADAM  0.01
#2  ADAM DAVID  0.02
#3   BOB DAVID  0.03
#4   BOB  CARL  0.04
#5  CARL  FRED  0.05
#6  CARL DAVID  0.06
#7 DAVID  ADAM  0.07
#8 DAVID  CARL  0.08
akrun
  • 874,273
  • 37
  • 540
  • 662