2

I am trying to build a for loop that will conduct a series of full joins in dplyr.

I would like to speed up this:

join1 <- full_join(Q1_output, Q2_output)
join2 <- full_join(join1, Q3_output)
join3 <- full_join(join2, Q4_output)
join4 <- full_join(join3, Q5_output)
join5 <- full_join(join4, Q6_output)
join6 <- full_join(join5, Q7_output)
join7 <- full_join(join6, Q8_output)
join8 <- full_join(join7, Q9_output)

The number of output files will not always equal 9 but they will always be in the format Qn_output where n changes for a given series of analysis.

Is there a way to construct a function that will do this? The output files will always be data frames & will always be joining on a common variable. I would also appreciate any feedback on whether a similar loop could be constructed to take a N column data frame and turn it into N vectors (e.x. repeating Q1 <- data$Q1, Q2 <- data$Q2).

Thank you!

Jason
  • 41
  • 6

1 Answers1

1

We can use mget to return the values in a list

lst <- mget(paste0("Q", 1:9, "_output")

Then reduce it to single dataset by doing the full_join

library(tidyverse)
reduce(lst, full_join, by = 'variable')
akrun
  • 874,273
  • 37
  • 540
  • 662
  • 1
    This is fantastic! If I have multiple common variables across the data frames but I only want to join by one, would I just add ,by="variable" after full_join? – Jason Aug 16 '18 at 15:10
  • @Jason Thanks. Sure. You have to do that. I updated the post. I thought you wanted the join variables to pick up automatically based on the commonality – akrun Aug 16 '18 at 15:12
  • sorry for not specifying there. This worked but it came up with an error about negative values, which in the past has indicated the data frame is too large. But, when I did the full joins manually, by the common variable, this was not the case. Do you know what would cause that? – Jason Aug 16 '18 at 15:17
  • @Jason Not sure about the error with negative values. If there is a common variable you are interested, it is always good to specify it – akrun Aug 16 '18 at 15:18
  • it is: Error in full_join_impl(x, y, by$x, by$y, suffix$x, suffix$y, check_na_matches(na_matches)) : negative length vectors are not allowed. I got that after adding the by = "variable." But when doing the 8 full joins manually, there is no error. – Jason Aug 16 '18 at 15:19
  • @Jason On a big dataset, with lots of common columns, it joins by all of them and then reach the threshold as suggested [here](https://stackoverflow.com/questions/42479854/merge-error-negative-length-vectors-are-not-allowed) – akrun Aug 16 '18 at 15:27
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/178152/discussion-between-jason-and-akrun). – Jason Aug 16 '18 at 15:36