I am working with customer sales data which consists of two datasets. The first set is an overview of the customerID and the type of customer (A or B) with the corresponding turnover. I also have an external set which has the same characteristics as the first one. I need to overwrite the turnover of the first dataset by the turnover of the second dataset based on the combination of customerID and type of customer in the second set. I have the following as an example code:
ID <- c(1,2,3,3,4,5,6,7,7,8,9,10,11,11,12,12,13,14,15)
Type <- c("A","A","A","B","A","A","A","A","B","A","A","A","A","B","A","B","A","A","A")
Turnover <- seq(100,1900,100)
data1 <- as.data.frame(cbind(ID,Type,Turnover))
ID2 <- c(3,7,11,12)
Type2 <- c("B","A","A","A")
Turnover2 <- c(150,450,600,750)
data2 <- as.data.frame(cbind(ID2,Type2,Turnover2))
My first idea was to make use of the %in%
function in the following manner:
data1[data1$ID %in% data2$ID2 & data1$Type %in% data2$Type2, "Turnover"] <- data2[data1$ID %in% data2$ID2 & data1$Type %in% data2$Type2, "Turnover2"]
But then I only obtain NAs and if I get a turnover, then it tends to ignore the combination of ID
and Type
and only focuses on ID
. Is there a smart and clever way to overcome this and make use of %in%
function in a multidimensional way, i.e., based on more columns? So I want to keep the first dataset as a whole, but only overwrite the turnover for the ID and Type of customer combination that are also present in the second set.