0

I have two datasets as follow:

Dataset 1:

Col1 Col2
a 1
a 2
a 3
b 4
b 5
b 6
a 77
a 88
b 99
b 00

Dataset 2:

Col1 Col2 Col3
a 1 X
a 2 X
a 3 Y
b 4 Y
b 5 Z
b 6 Z

I am looking to generate:

Col1 NewCol2
a X
a Y
b Y
b Z
a 77
a 88
b 99
b 00

I really appreciate any input. This will save a lot of time in the future.

Thank you!

I do this on excel but I am trying to automate it.

Andre Wildberg
  • 12,344
  • 3
  • 12
  • 29
Sundew
  • 13
  • 4
  • 3
    This is a merge/join operation, cleaned up with a coalescing (ifelse/is.na) transformation. Base R: `merge(Dataset1, Dataset2, by = c("Col1", "Col2"), all.x = TRUE) |> transform(NewCol2 = ifelse(is.na(Col3), Col2, Col3)) |> subset(select = -c(Col2, Col3)) |> unique()`. Dplyr: `left_join(Dataset1, Dataset2, by = c("Col1", "Col2")) %>% mutate(NewCol2 = coalesce(Col3, Col2)) %>% select(-Col2, -Col3) %>% unique()` – r2evans Mar 23 '23 at 17:08
  • Thank you for this. Can you please explain the logic of this code in simple english? – Sundew Mar 23 '23 at 17:13
  • I can only scratch the surface; the dupe-links do a much better job about explaining and visually displaying what is happening in a merge/join operation. It's certainly a "core" operation in data-management in that once you get it, it is useful in countless ways for normalizing, cleaning, and combining data. It's not something Excel does in a spreadsheet (its PowerQuery can do it), so it is often a new trick to learn. – r2evans Mar 23 '23 at 19:14

0 Answers0