1

I'm having an issue connecting two ID's and there corresponding categorical values for two different tables.

I have 2 example tables defined as,

table1

ID
123
222
451
314
951

table2

ID Category ID
123 1
123 1
451 2
451 2
101 3

What I want to do is align the Category ID in table2 so that it correctly corresponds with the ID in table1 and also, if an id in table1 appears that is not in table2, the category ID should take the value -1. Here is what I want table1 to look like:

ID Category ID
123 1
222 -1
451 2
314 -1
951 -1

I have a solution that (I think) works, code below, but it is computationally very long as I have over 500,000 data points in each table, so I would much prefer a simpler solution, which I'm sure exists, but I cannot come up with. Any help would be great, thanks.

new_category_id = numeric()
for (i in 1:nrow(table1)){
  for (j in 1:nrow(table2)){
    
    if (table1$ID[i] == table2$ID[j]){
      new_category_id = append(new_category_id, table2$"Category ID"[j])
      break
      
    }}

  if (!(table1$ID[i] %in% table2$ID)){
    new_category_id = append(new_category_id, -1)
  }
}

table1["Category ID"] = new_category_id
benson23
  • 16,369
  • 9
  • 19
  • 38

2 Answers2

1

You can left_join the two, and replace NA using replace_na() in mutate().

library(dplyr)

left_join(table1, 
          distinct(table2),
          by  = "ID") %>% 
  mutate(CategoryID = replace_na(CategoryID, -1))

   ID CategoryID
1 123          1
2 222         -1
3 451          2
4 314         -1
5 951         -1
benson23
  • 16,369
  • 9
  • 19
  • 38
1

Here is a data.table alternative. You can join the tables together by Category ID and replace NA where there is no match with -1. It is not clear if Category_ID is always the same for the same ID; in this case, just keeping the first match in the second data.table.

library(data.table)

setDT(dt1)
setDT(dt2)

dt3 <- dt2[dt1, on = .(ID), mult = "first", nomatch = NA]
setnafill(dt3, cols = "Category_ID", fill = -1)

Output

    ID Category_ID
1: 123           1
2: 222          -1
3: 451           2
4: 314          -1
5: 951          -1
Ben
  • 28,684
  • 5
  • 23
  • 45