6

I have two tables in different dimensions, now I want to replace value datA$swl1 with values in datB$swl2 according to userids.

datA

 id swl1
 1   0.8
 2   0.7
 3   0.4
 4   0.7
 5   0.0

datB

id   swl2
 1   0.8
 3   0.6
 5   0.7

output

datA (here swl1 is replaced by the new values in swl2, but not all the ids have a new values, for those that haven't, the original values are retained)

 id swl1
 1   0.8
 2   0.7
 3   0.6
 4   0.7
 5   0.7

how to do this?

Matthew Lundberg
  • 42,009
  • 6
  • 90
  • 112
Lucia
  • 615
  • 1
  • 9
  • 16

4 Answers4

7

You can use merge to match by id, then replace in column swl1 those items from datB which exist:

datC <- merge(datA, datB, all.x=TRUE)
datC
##   id swl1 swl2
## 1  1  0.8  0.8
## 2  2  0.7   NA
## 3  3  0.4  0.6
## 4  4  0.7   NA
## 5  5  0.0  0.7

This matches up the rows. Now to replace those values in column swl1 with the non-NA values from column swl2:

datC$swl1 <- ifelse(is.na(datC$swl2), datC$swl1, datC$swl2)
datC$swl2 <- NULL
datC
##   id swl1
## 1  1  0.8
## 2  2  0.7
## 3  3  0.6
## 4  4  0.7
## 5  5  0.7
Matthew Lundberg
  • 42,009
  • 6
  • 90
  • 112
6

You can obtain this result with one line of code:

datA$swl1[datA$id %in% datB$id] <- datB$swl2
#> datA
#  id swl1
#1  1  0.8
#2  2  0.7
#3  3  0.6
#4  4  0.7
#5  5  0.7

With the %in% operator we select the entries of the column datA$swl1 that belong to rows with the same id as those listed in datB. These values in the column of datA$swl1 are then replaced with the entries of the swl2 column of datB.

RHertel
  • 23,412
  • 5
  • 38
  • 64
5

IIUC, using data.table v1.9.5:

require(data.table)
setDT(datA)[datB, swl1 := swl2, on = "id"]

datA is updated by reference.

Arun
  • 116,683
  • 26
  • 284
  • 387
  • This worked for me. One feature of this approach is that any NA values in swl2 will be copied over to swl1. This was a positive for me as I wanted to retain the NA values. – Bong112 Jan 20 '23 at 11:14
1

If you'd like to select the largest value, regardless of which column it is in, you could try

library(dplyr)
datA <- data.frame(id=c(1,2,3,4,5), swl1=c(0.8, 0.7, 0.4, 0.7, 0.0))
datB <- data.frame(id=c(1,3,5), somename=c(0.8, 0.6, 0.7))

datC <- full_join(datA, datB)
datA <- data.frame(id=c(1:5))    
datA$swli1 <- apply(datC[, c('swl1', 'somename')], 1, function(x) max(na.omit(x)))

> datA
  id swli1
1  1   0.8
2  2   0.7
3  3   0.6
4  4   0.7
5  5   0.7
tumultous_rooster
  • 12,150
  • 32
  • 92
  • 149