If data present, replace with data from another column based on row ID

Question

I would like to replace data in columns rep1 to rep4. The data in these columns match unique ID's in the first column. I want to replace the data in columns rep1-rep4 with data in the value column with the corresponding ID row. So, for the second row "b", I want to replace "a" in the column "rep1" with the corresponding value in row "a", in this case, -400.

ID  rep1  rep2  rep3  rep4  value
a                           -400
b   a                       -300
c   a     b                 -200
d   a     b     c           -300
e   a     b     c     d     -400
f                           -400
g   f                       -400
h                           -400
i                           -200
j   k     l                 -300
k   l                       -200
l                           -300
m                           -300

It seems like using ifelse(!is.na()) might be able to do something here, but I'm not sure how to match the ID data in columns rep1 to rep4 to the corresponding row in the ID column, identifying what data in "value" is supposed to be used in the replacement. Can this be done in the same dataframe, or does it need to be split into two different dataframes to work?

Here is the data using dput()

structure(list(ID = structure(1:13, .Label = c("a", "b", "c", 
"d", "e", "f", "g", "h", "i", "j", "k", "l", "m"), class = "factor"), 
    rep1 = structure(c(1L, 2L, 2L, 2L, 2L, 1L, 3L, 1L, 1L, 4L, 
    5L, 1L, 1L), .Label = c("", "a", "f", "k", "l"), class = "factor"), 
    rep2 = structure(c(1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 3L, 
    1L, 1L, 1L), .Label = c("", "b", "l"), class = "factor"), 
    rep3 = structure(c(1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L), .Label = c("", "c"), class = "factor"), rep4 = structure(c(1L, 
    1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("", 
    "d"), class = "factor"), value = c(-400L, -300L, -200L, -300L, 
    -400L, -400L, -400L, -400L, -200L, -300L, -200L, -300L, -300L
    )), class = "data.frame", row.names = c(NA, -13L))

Can you post your sample data in `dput` format? Please edit **the question** with the output of `dput(df)`. Or, if it is too big with the output of `dput(head(df, 20))`. (`df` is the name of your dataset.) — Rui Barradas, Jul 12 '19 at 06:25
@RuiBarradas I added the sample data using `dput` to the bottom of the question. — Corey, Jul 12 '19 at 06:56

score 4 · Answer 1 · answered Jul 12 '19 at 06:25

A base R way would be to identify names of the column which we want to match (here rep), then unlist them and match with ID and replace them with corresponding value.

cols <- grep("^rep", names(df))
df[cols] <- df$value[match(unlist(df[cols]), df$ID)]

df
#   ID rep1 rep2 rep3 rep4 value
#1   a   NA   NA   NA   NA  -400
#2   b -400   NA   NA   NA  -300
#3   c -400 -300   NA   NA  -200
#4   d -400 -300 -200   NA  -300
#5   e -400 -300 -200 -300  -400
#6   f   NA   NA   NA   NA  -400
#7   g -400   NA   NA   NA  -400
#8   h   NA   NA   NA   NA  -400
#9   i   NA   NA   NA   NA  -200
#10  j -200 -300   NA   NA  -300
#11  k -300   NA   NA   NA  -200
#12  l   NA   NA   NA   NA  -300
#13  m   NA   NA   NA   NA  -300

data

df <- structure(list(ID = c("a", "b", "c", "d", "e", "f", "g", "h", 
"i", "j", "k", "l", "m"), rep1 = c(NA, "a", "a", "a", "a", NA, 
"f", NA, NA, "k", "l", NA, NA), rep2 = c(NA, NA, "b", "b", "b", 
NA, NA, NA, NA, "l", NA, NA, NA), rep3 = c(NA, NA, NA, "c", "c", 
NA, NA, NA, NA, NA, NA, NA, NA), rep4 = c(NA, NA, NA, "MA", "d", 
NA, NA, NA, NA, NA, NA, NA, NA), value = c(-400L, -300L, -200L, 
-300L, -400L, -400L, -400L, -400L, -200L, -300L, -200L, -300L, 
-300L)), class = "data.frame", row.names = c(NA, -13L))

This worked using the sample I provided, but I couldn't get it to function properly on my actual data set. It returned everything as NA. `grep` identified the columns to change, `unlist` seemed to work, but `match` didn't. I think the issue is that in the actual data set, value is an integer while the rep columns are factors. This doesn't seem to be an issue with the tidyverse solution however. — Corey, Jul 12 '19 at 07:39
@Corey strange, I think this should still work with factors as well. Can you convert the columns to characters and then check again? `df[cols] <- lapply(df[cols], as.character)` — Ronak Shah, Jul 12 '19 at 07:44

score 2 · Accepted Answer · answered Jul 12 '19 at 07:10

Here a variant with tidyverse:

df %>% mutate_at(vars(rep1:rep4), ~ value[match(., ID)])

Explanation:

mutate_at allows to select a range of variables to be modified
the ~ ... . (quosure style lambda notation) allows to use an expression in which . (dot) stands for the column to be modified. Otherwise you would have to use function(x) df$value[match(x, df$ID)], which is a lot to type.
vars() are necessary in mutate_at to be able to select columns without quotes (otherwise you would need to use 2:5 or paste0("rep", 1:4)).

This worked well. I had some issues with classes being mixed in my actual dataset, but the tidyverse approach doesn't seem to have any issues dealing with the mismatch. — Corey, Jul 12 '19 at 07:40

If data present, replace with data from another column based on row ID

2 Answers2

Linked