1

I would like to update the dataframe d_sub with two new columns x,y(and excluding column xy) based on the matching of the common columns(treatment,replicate) in the parent dataframe d.

set.seed(0)
x <- rep(1:10, 4)
y <- sample(c(rep(1:10, 2)+rnorm(20)/5, rep(6:15, 2) + rnorm(20)/5))
treatment <- sample(gl(8, 5, 40, labels=letters[1:8]))
replicate <- sample(gl(8, 5, 40))
d <- data.frame(x=x, y=y, xy=x*y, treatment=treatment, replicate=replicate)

d_sub <- d[sample(nrow(d),6),4:5]
d_sub
#     treatment replicate
# 32         b         2
# 11         h         7
# 9          h         3
# 20         e         3
# 10         b         5
# 7          d         3

Unlike the normal merge or other methods mentioned here, I would only need to extract few columns as shown in the below expected output:

#     treatment replicate x         y
# 32         b         2  2  8.998847
# 11         h         7  1  5.082928
# 9          h         3  2  7.050445
# 20         e         3 10 10.145350
# 10         b         5 10  7.941056
# 7          d         3  7  6.814287

Note the exclusion of xy column in the output here! In my original problem, there are thousands of columns which I would not require in the output than the required very few columns. I am especially looking for methods other than merge to know if I can achieve the solution in a memory-efficient way.

Prradep
  • 5,506
  • 5
  • 43
  • 84
  • 1
    You can remove the unwanted columns beforehand and then merge. Trying to match on two columns would require to paste the two variables in both dataframes and the use match to index which I don't think It will be faster than merge. All questions I could find about matching two columns are answered with merge. You can always use `data.table` methods for additional efficiency – Sotos Jun 20 '17 at 13:55

1 Answers1

1

I guess it has been asked here before, but what you are looking for is:

merge(d_sub, d, by=c("treatment", "replicate"))

or:

d_sub <- merge(d_sub, d, by=c("treatment", "replicate"))
Jan
  • 3,825
  • 3
  • 31
  • 51
  • Thanks, but I would not require all the columns. – Prradep Jun 20 '17 at 13:14
  • If you use dplyr (as I guess from the tags used) then it would just be my second statement followed by this line: d_sub <- d_sub %>% select (-xy) – Jan Jun 20 '17 at 13:39