I would like to update the dataframe d_sub
with two new columns x
,y
(and excluding column xy
) based on the matching of the common columns(treatment
,replicate
) in the parent dataframe d
.
set.seed(0)
x <- rep(1:10, 4)
y <- sample(c(rep(1:10, 2)+rnorm(20)/5, rep(6:15, 2) + rnorm(20)/5))
treatment <- sample(gl(8, 5, 40, labels=letters[1:8]))
replicate <- sample(gl(8, 5, 40))
d <- data.frame(x=x, y=y, xy=x*y, treatment=treatment, replicate=replicate)
d_sub <- d[sample(nrow(d),6),4:5]
d_sub
# treatment replicate
# 32 b 2
# 11 h 7
# 9 h 3
# 20 e 3
# 10 b 5
# 7 d 3
Unlike the normal merge
or other methods mentioned here, I would only need to extract few columns as shown in the below expected output:
# treatment replicate x y
# 32 b 2 2 8.998847
# 11 h 7 1 5.082928
# 9 h 3 2 7.050445
# 20 e 3 10 10.145350
# 10 b 5 10 7.941056
# 7 d 3 7 6.814287
Note the exclusion of xy
column in the output here! In my original problem, there are thousands of columns which I would not require in the output than the required very few columns. I am especially looking for methods other than merge
to know if I can achieve the solution in a memory-efficient way.