0

I have a dataframe of anonymized data, like this:

    31  32  36  NA
    31  32  34  NA
    31  32  NA  NA
    32  33  NA  NA

And I have a "dictionary" that maps each anonymized value to an individual's ID, like this:

    ID  Number
    Male1   31
    Male2   32
    Female1 33
    Female2 34
    Female3 36

I want to replace all of the values in the first dataframe with the corresponding ID, as stored in the second dataframe, like this:

    Male1   Male2   Female3 NA
    Male1   Male2   Female2 NA
    Male1   Male2       NA  NA
    Male2   Female1     NA  NA

I've tried using ifelse statements and replace, but nothing has worked so far. I tried:

Test2 <- ifelse(Dataframe1 == Dataframe2$Number, Dataframe2$ID, Dataframe2$ID)

But this only replaced instances where the same value was in the same place in the dataframe, so I got:

    Male1   32  36  NA
    31      32  34  NA
    31      32  NA  NA
    32      33  NA  NA

I'm pretty much completely stuck. I would really appreciate any help, and especially explanations of solutions- I'm a graduate student who's really new to working with r.

Thanks so much!

  • I think this is `merge` (or `dplyr::*_join`), so a duplicate of https://stackoverflow.com/q/1299871, but it's difficult to know for certain without true data. If that question and its answers resolve your issue, feel free to self-close/delete this question, otherwise please expand your question to include real data, such as with `dput(head(x))` for each frame. See: https://stackoverflow.com/questions/5963269, https://stackoverflow.com/help/mcve, and https://stackoverflow.com/tags/r/info. – r2evans Sep 10 '19 at 17:49
  • Try using multiple left_join's. – Monk Sep 10 '19 at 17:55

1 Answers1

0

Using mapvalues from the plyr package:

df1 <- read.table(text = " 31  32  36  NA
    31  32  34  NA
    31  32  NA  NA
    32  33  NA  NA")
df2 <- read.table(text ="ID  Number
    Male1   31
    Male2   32
    Female1 33
    Female2 34
    Female3 36", header = T)

df3 <- df1
df3[,] <- lapply(df1, function(x) plyr::mapvalues(x, from = df2$Number, to = df2$ID))

Which results in :

    V1      V2      V3   V4
1 Male1   Male2 Female3 <NA>
2 Male1   Male2 Female2 <NA>
3 Male1   Male2    <NA> <NA>
4 Male2 Female1    <NA> <NA>
GordonShumway
  • 1,980
  • 13
  • 19
  • those aren't error message they are warning messages and occur as you don't have all the values in each column. They can be suppressed using `warn_missing = TRUE` in the `mapvalues` call – GordonShumway Sep 10 '19 at 19:42