I'm trying to left join between two data.tables using a condition (income>cost for example). I need to this in data.table (and not say dplyr) due to efficiency as I'm working huge datasets. My problem is, when I join using the following code, my "income" column gets overwritten with the values of the cost column.
Any suggestion to fix this behavior?
A <- data.table(ID=c(1,2,3,4),cost=c(1000,2000,3000,4000))
B <- data.table(income=c(1500,2500,3500,4500),name=c("A","B","C","D"))
#The output:
B[A,.(ID,cost,income,name),on=.(income>cost)
,allow.cartesian=TRUE
,nomatch=NULL]
#Expected output:
A <- A %>% mutate(ID_j=1)
B <- B %>% mutate(ID_j=1)
A %>%
left_join(B) %>%
as.data.frame() %>%
filter(cost<income) %>%
select(-ID_j)
Used different libraries, but I need to make it work with data.table.