0

I'm working in a loop and generating a df on each iteration. As I iterate, I am joining the results into one big table. The following code works as intended, but seems overly complicated. Is there a way to simplify this so I don't have to have an if/else block?

if(exists("ModelOutput.Full")){
  ModelOutput.Full <- ModelOutput.Full%>%
    distinct()%>%
    left_join(ModelOutput, by = "ID")
} else {
  ModelOutput.Full <- ModelOutput
}

I was hoping to just use the else code and have it create ModelOutput.Full on the first iteration, but that doesn't happen.

Also, feel free to suggest other optimizations that I'm not asking about. I'm sure they exist.

Edit 2: Thanks to DSGym's input, I've gotten this working, though it took slight modification of their answer, as I didn't provide reproducible code in my initial question. Here's an illustration of what worked for me:

regions <- c(1:7)
drivers <- c(1:5)
ModelOutput <- list()
ModelOutput.Regional <- list()
ID <- c(1:6961896)%>%
  as.vector()%>%
  as.data.frame()%>%
  rename("ID"=".")
modelOutput <- list()
modelOutput.regional <- list()
for (region in regions) {
  for (driver in drivers)
    vals <- sample(0:10, 6961896, replace = TRUE)/10
    outName <- paste("driver",driver,sep="")
    vals <- vals%>%
      as.vector()%>%
      as.data.frame()%>%
      rename(!!outName := ".")%>%
      bind_cols(ID)
    ModelOutput[[driver]] <- vals
  }
  ModelOutput.Regional[[region]] <- as.data.frame(Reduce(function(x, y) merge(x, y, by = "ID", all.x = TRUE), ModelOutput))
}
ModelOutput.Full <- Reduce(function(x, y) bind_rows(x, y), ModelOutput.Regional)

This generates my desired output of a giant data frame with all the regional data and the scores of each 'driver' in labeled columns like this:

ID  driver1 driver2 driver3 driver4 driver5
1     0.1     0.2     0.4     0.6     0.4
2     0.4     0.6     0.5     0.7     0.7
3     0.3     0.7     0.5     0.2     0.3
ffollett
  • 15
  • 6
  • 1
    It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. It's probably better not to modify the table in the loop. Just calculate a list of values and then combine them at the end. – MrFlick Apr 26 '19 at 20:13
  • Thanks for the suggestion. I've added a repEx and implemented the lists, but I'm still unsure how to combine. – ffollett Apr 30 '19 at 16:53

2 Answers2

0

As MrFlick mentioned in the comment above. It is easier to combine the data frames in the end. You can do something like the following.

Since I don't know what your loop structure looks like, I will assume you can generate a vector of data frames called dfs

# method 1
ModelOutput.Full = dplyr::bind_rows(dfs)

# method 2
ModelOutput.Full = do.call("rbind", dfs)
Joe
  • 138
  • 1
  • 5
0

Not 100% sure how to do it without a reproduceable example, but I think this should help:

  1. Store all dataframes in a list

storelist <- list() ## Store all your df´s

Use the loop and store it like this

for(i in 1:length(dfs) {
   storelist[[i]] <- dfs[[i]]
}

Use this function to join all dataframes by ID

Reduce(function(x) merge(x, by='ID', all.x=TRUE), storelist)
DSGym
  • 2,807
  • 1
  • 6
  • 18
  • I added a repEx and implemented lists, but in a different way than you suggested. I couldn't get the Reduce to work, though. Even just working on an instance of modelOutput, I ran "Reduce(function(modelOutput) merge(modelOutput, by="ID", all.x=TRUE), storelist)" and it returns NULL without modifying storelist. – ffollett Apr 30 '19 at 16:56