0

I have two datasets from the Global Energy Monitor and want to merge them using RStudio. Both datasets show the “TrackerID” of different power plants. Dataset1 lists every TrackerID 1, 2, 3, ..., i. In Dataset2, some TrackerIDs are missing, while others are used several times. Therefore, one TrackerID in Dataset2 can have different values of the variable "signatory".

I would like to merge the datasets. Each observation from dataset 1, for which there are several observations in dataset 2, is to be duplicated so that each observation from dataset 2 has its own row.

This is a sample dataset to better understand my question. The original Dataset includes way more variables and observations.

Dataset1 <- data.frame(TrackerID = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10))
Dataset2 <- data.frame(TrackerID = c(1, 3, 3, 6, 7, 7, 7, 10),
Signatory= c(1, 1, 0, 0, 1, 0, 1, 1))

This is what the Dataset created should look like:

Dataset.wished <- data.frame(TrackerID = c(1, 2, 3, 3_1, 4, 5, 6, 7, 7_1, 7_2, 8, 9, 10),
Signatory = c(1, "NA", 1, 0, "NA", "NA", 0, 1, 0, 1, "NA", "NA", 1))
Lilly
  • 17
  • 5
  • This sounds like a very straightforward application of `merge`. Have done any searching of SO or running examples in the help pages? – IRTFM Jan 20 '23 at 09:50
  • `merge(Dataset1, Dataset2, all = TRUE)` – Maël Jan 20 '23 at 09:50
  • It wasn’t clear in the problem specifications whether it should be `all=TRUE` or `all.y=TRUE` ( the second option assuming Datase2 was the second argument to merge. – IRTFM Jan 20 '23 at 09:57
  • 1
    Added 2nd link to make the TrackerID unique. – zx8754 Jan 20 '23 at 10:00

1 Answers1

0
Dataset1 <- data.frame(TrackerID = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10))
Dataset2 <- data.frame(TrackerID = c(1, 3, 3, 6, 7, 7, 7, 10),
                       Signatory= c(1, 1, 0, 0, 1, 0, 1, 1))

library(data.table)

Dataset.wished <- merge.data.table(Dataset1, Dataset2, all = T)

setDT(Dataset.wished)

Dataset.wished[, TrackerID := as.character(TrackerID)]
Dataset.wished[TrackerID == shift(TrackerID, type = "lag"), TrackerID := paste(TrackerID, 1:.N, sep = "_"), TrackerID]

Dataset.wished

# TrackerID Signatory
# 1:         1         1
# 2:         2        NA
# 3:         3         1
# 4:       3_1         0
# 5:         4        NA
# 6:         5        NA
# 7:         6         0
# 8:         7         1
# 9:       7_1         0
# 10:       7_2         1
# 11:         8        NA
# 12:         9        NA
# 13:        10         1
Merijn van Tilborg
  • 5,452
  • 1
  • 7
  • 22