Create R Loop for two datasets with multiple used IDs

Question

I have two datasets from the Global Energy Monitor and want to merge them. Both datasets show the “TrackerID” of different power plants. Dataset1 lists every TrackerID 1, 2, 3, ..., i. In Dataset2, some TrackerIDs are missing, while others are used several times. Therefore, one TrackerID in Dataset2 can have different values of the variable "signatory".

I want to create a weighted signatory variable calculating the weighted average of the different signatory status given in Dataset2 and include it in Dataset1. If TrackerID does not exists in Dataset2, set NA. If it does, calculate the weighted average.

I started to build a loop but it doesn’t work. This is what I’ve got so far, but maybe I’m on the wrong track:

allIDs <- as.list(Dataset1$TrackerID)
k = 1
for(i in allIDs){
  Dataset1$weightedsignatory[k] <- i
  k <- k + 1
}

This is a sample dataset to better understand my question:

Dataset1 <- data.frame(TrackerID = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10),
             Name = c("a", "b", "c", "d", "e", "f", "g", "h", "i", "j"))

Dataset2 <- data.frame(TrackerID = c(1, 3, 3, 6, 7, 7, 7, 10),
                       Signatory= c(1, 1, 0, 0, 1, 0, 1, 1))

This is what the Dataset created by the loop should look like:

Dataset.wished <- data.frame(TrackerID = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10),
                       Name = c("a", "b", "c", "d", "e", "f", "g", "h", "i", "j"),
                       w.Signatory = c(1, "NA", 0.5, "NA", "NA", 0, 0.66, "NA", "NA", 1))

Any help is highly appreciated!

Welcome to SO, Lilly! Questions on SO (especially in R) do much better if they are reproducible and self-contained. By that I mean including attempted code (please be explicit about non-base packages), sample representative data (perhaps via `dput(head(x))` or building data programmatically (e.g., `data.frame(...)`), possibly stochastically), perhaps actual output (with verbatim errors/warnings) versus intended output. Refs: https://stackoverflow.com/q/5963269, [mcve], and https://stackoverflow.com/tags/r/info. — r2evans, Jan 09 '23 at 15:01
From what you've shared I can't tell whether you've already created the weighted average or if you need help with that part - please see r2evans's links to help make the question clear and reproducible. As for the merging part, also have a look a the FAQ we have [How to merge/join data in R?](https://stackoverflow.com/q/1299871/903061). — Gregor Thomas, Jan 09 '23 at 15:05
This is basically two problems wrapped into 1. First you want to aggreate your data to get one row per tracker ID then merge it together. With base R, you can do this with `merge(Dataset1, aggregate(Signatory~TrackerID, Dataset2, mean), all.x=TRUE)`. There's no need for looping. — MrFlick, Jan 09 '23 at 16:03

score 0 · Answer 1 · answered Jan 09 '23 at 22:05

Here a tidyverse approach

Code

library(dplyr)

Dataset1 %>% 
  left_join(Dataset2) %>% 
  group_by(TrackerID,Name) %>% 
  summarise(Signatory = mean(Signatory))

Output

# A tibble: 10 x 3
# Groups:   TrackerID [10]
   TrackerID Name  Signatory
       <dbl> <chr>     <dbl>
 1         1 a         1    
 2         2 b        NA    
 3         3 c         0.5  
 4         4 d        NA    
 5         5 e        NA    
 6         6 f         0    
 7         7 g         0.667
 8         8 h        NA    
 9         9 i        NA    
10        10 j         1

Create R Loop for two datasets with multiple used IDs

1 Answers1

Code

Output