Question:
How do I efficiently join two tables by cross checking 2x2 columns using R tidyverse and dplyr? I'm relatively new to R, but I cannot find this question addressed in any previous questions or discussions.
I have two tables with different numbers of rows and columns. Each table contains column A and B. These columns contain strings that can be either identical or unique, and they may also overlap or be missing from one or the other column. Basically, I need to check column A1 against both A2 and B2, and then check B1 against both A2 and B2.
Example to explain consept:
df1
ID pISSN eISSN Level
437097 1530-9932 1
489309 2366-004X 2366-0058 1
437103 0025-5858 1
437109 1042-9670 1545-7230 1
449363 1093-1139 0
437127 0949-1775 1
437124 0361-3682 1873-6289 2
481203 0103-846X 0103-846X 1
479825 2153-2184 2153-2192 0
437136 0734-2071 1557-7333 2
df2
ID pISSN eISSN Format
41120 2364-9534 E OA S C
12249 1530-9932 E OF S
261 2366-0058 E OF S
12188 0025-5858 1865-8784 PE OF S
40596 1042-9670 1545-7230 PE OF S
12129 0895-4852 1936-4709 PE OF
769 0949-1775 1432-0517 PE OF S
result
ID pISSN eISSN Level Format
437097 1530-9932 1 E OF S
489309 2366-004X 2366-0058 1 E OF S
437103 0025-5858 1865-8784 1 PE OF S
437109 1042-9670 1545-7230 1 PE OF S
437127 0949-1775 1 PE OF S
Example tables for input:
dput(df1, file = "")
structure(list(ID = c(437097, 489309, 437103, 437109, 449363, 437127, 437124, 481203, 479825, 437136), pISSN = c(NA, "2366-004X", "0025-5858", "1042-9670", "1093-1139", NA, "0361-3682", "0103-846X", "2153-2184", "0734-2071"), eISSN = c("1530-9932", "2366-0058", NA, "1545-7230", NA, "0949-1775", "1873-6289", "0103-846X", "2153-2192", "1557-7333"), Level = c(1, 1, 1, 1, 0, 1, 2, 1, 0, 2)), row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"))
dput(df2, file = "")
structure(list(ID = c(41120, 12249, 261, 12188, 40596, 12129, 769), pISSN = c(NA, NA, NA, "0025-5858", "1042-9670", "0895-4852", "0949-1775"), eISSN = c("2364-9534", "1530-9932", "2366-0058", "1865-8784", "1545-7230", "1936-4709", "1432-0517"), Format = c("E OA S C", "E OF S", "E OF S", "PE OF S", "PE OF S", "PE OF", "PE OF S")), row.names = c(NA, -7L), class = c("tbl_df", "tbl", "data.frame"))