0

I'm currently working with data.table and am attempting to merge on two tables that resemble the following:

wflanthro:

height   sex   l
18       1     2
18       2     2.4
19       1     3
19       2     3
20       1     3.1
20       2     3.2
21       1     4
21       2     4
22       1     5
22       2     5.1

length:

ID low.len  sex
1  20       1
2  21       2
3  22       1
4  21       1
5  19       2

I want to obtain the following table from merging the two together: (notice that low.len and height are merged on each other)

ID low.len  sex   l
1  20       1     3.1
2  21       2     4
3  22       1     5
4  21       1     4
5  19       2     3

I have read up on the usage of by.y and by.x statements and was hoping to utilize that here, but I'm not having luck using more than 1 variable in the by.y and by.x vectors

here's what I've tried:

 l_value_diff_len_gr_0<-merge(wflanthro,length,by.x=("height", "sex"), by.y=("low.len","sex"))

This errors out and does not work. I do not want to change the names of the variables formally with a setnames statement as I have to produce several tables that have the same format as above on various sets of variables, so figuring out how to merge them similarly to the code written above without changing the variable names is ideal. What am I doing wrong?

bziggy
  • 463
  • 5
  • 19
  • There is no column with name `htcm` in your example – akrun Nov 23 '20 at 22:56
  • @akrun thanks for catching that! This was an unintended mistake on my end in my post, but doesn't fixt the error. I just updated it with the height variable name – bziggy Nov 23 '20 at 22:58
  • What is the error? Please post its exact message. Also, be sure to avoid naming objects to R functions and keywords like `length`. – Parfait Nov 24 '20 at 00:29
  • If this is a data.table question, this should be done in the following manner `library(data.table) ; setDT(wflanthro) ; setDT(length) ; wflanthro[length, l := i.l, on = .(low.len = height, sex)]`. Then check `wflanthro` - it should be updated in place – David Arenburg Nov 24 '20 at 07:15

1 Answers1

0

It is a mistake to use the merge function.

That would be right to merge the two dataframe with two merging key.

# Input
wflanthro <- structure(list(ID = c(1, 2, 3, 4, 5), low.len = c(20, 21, 22, 
21, 19), sex = c(1, 2, 1, 1, 2)), row.names = c(NA, -5L), class = c("tbl_df", 
"tbl", "data.frame"))

length <- structure(list(height = c(18, 18, 19, 19, 20, 20, 21, 21, 22, 
22), sex = c(1, 2, 1, 2, 1, 2, 1, 2, 1, 2), l = c(2, 2.4, 3, 
3, 3.1, 3.2, 4, 4, 5, 5)), row.names = c(NA, -10L), class = c("tbl_df", 
"tbl", "data.frame"))

# merge
l_value_diff_len_gr_0 <- merge(wflanthro, length, by.x=c("height", "sex"), by.y=c("low.len","sex")) %>% arrange(ID)

l_value_diff_len_gr_0

      height sex   l ID
    1     20   1 3.1  1
    2     21   2 4.0  2
    3     22   1 5.0  3
    4     21   1 4.0  4
    5     19   2 3.0  5

Is there any problem?

jhyeon
  • 456
  • 4
  • 14