0

Suppose A is all the data that should appear for all id and year, and B is the data that is actually presented. From the information in B, it seems that not all id appear in every issue, so I want to know which id did not provide information in those years. And denote it as NA. SO... there are two dataframe A and B following down,

id <- c(1, 1, 1, 2, 2, 2, 3, 3, 3)
year <- c(2000, 2001, 2002, 2000, 2001, 2002, 2000, 2001, 2002)
A <- data.frame(id, year); colnames(A) <- c("ID", "year")
#
id <- c(1, 1, 1, 2, 2, 3)
year <- c(2000, 2001, 2002, 2001, 2002, 2002)
B <- data.frame(id, year); colnames(B) <- c("ID", "year")
cbind(A, B)

If I trying to use merge function would be like...

merge(A, B, by = "ID", all = TRUE)

but what I expected my new dataframe is like...

id <- c(1, 1, 1, 2, 2, 2, 3, 3, 3)
year.x <- c(2000, 2001, 2002, 2000, 2001, 2002, 2000, 2001, 2002)
year.y <- c(2000, 2001, 2002, NA, 2001, 2002, NA, NA, 2002)
D <- data.frame(id, year.x, year.y); D

Thank you!

Sotos
  • 51,121
  • 6
  • 32
  • 66
CHIA
  • 59
  • 5

2 Answers2

3

One way is to match both columns, i.e.

A$year.y <- A$year[match(do.call(paste0, A), do.call(paste0, B))]


  ID year year.y
1  1 2000   2000
2  1 2001   2001
3  1 2002   2002
4  2 2000     NA
5  2 2001   2000
6  2 2002   2001
7  3 2000     NA
8  3 2001     NA
9  3 2002   2002
Sotos
  • 51,121
  • 6
  • 32
  • 66
1

An updated data.table alternative for each A$id that matches a B$id, compares A$year to B$year and classify them as MATCH or NO MATCH in a a column matched based on whether they are identical or not.

id <- c(1, 1, 1, 2, 2, 2, 3, 3, 3)
year <- c(2000, 2001, 2002, 2000, 2001, 2002, 2000, 2001, 2002)
A <- data.frame(id, year); colnames(A) <- c("ID", "year.x")
#

id <- c(1, 1, 1, 2, 2, 3)
year <- c(2000, 2001, 2002, 2001, 2002, 2002)
B <- data.frame(id, year); colnames(B) <- c("ID", "year.y")



A %>% inner_join(B, by = 'ID') %>%
  mutate(matched = c('no_match', 'match')[1 + (year.x == year.y)])

 
  ID    year.x year.y  matched
1   1   2000   2000    match
2   1   2000   2001 no_match
3   1   2000   2002 no_match
4   1   2001   2000 no_match
5   1   2001   2001    match
6   1   2001   2002 no_match
7   1   2002   2000 no_match
8   1   2002   2001 no_match
9   1   2002   2002    match
10  2   2000   2001 no_match
11  2   2000   2002 no_match
12  2   2001   2001    match
13  2   2001   2002 no_match
14  2   2002   2001 no_match
15  2   2002   2002    match
16  3   2000   2002 no_match
17  3   2001   2002 no_match
18  3   2002   2002    match
 
Rfanatic
  • 2,224
  • 1
  • 5
  • 21