4

I'm hoping someone out there might be able to help me get to the root of a frustrating problem I've been having with my code in R. I have a list consisting of dataframes, and I want to left join each element on one of two OTHER dataframes (call them A and B). Which of these secondary dataframes to join on depends on the element's position in the list. For my purposes, I'd like every odd element to be left-joined to A, and every even element to be left-joined to B.

library(dplyr)
DF <- data.frame(Num = c("1","2"), Let = c("a","b"), stringsAsFactors = FALSE)
A <- data.frame(Let = c("a","b"), Col = c("Yellow","Red"), stringsAsFactors = FALSE)
B <- data.frame(Let = c("a","b"), Col = c("Green","Blue"), stringsAsFactors = FALSE)
LIST <- list(DF, DF)

So far, I've tried to do this in two different ways. The first approach involved an if-else statement. If I apply such a statement to assign an integer value based on position, I obtain the expected result. Similarly, when I do away with the if-else statement and simply perform a series of left-joins on the list elements, everything works as expected.

lapply(seq_along(LIST), function(x, {ifelse((x %% 2)==0, y[[x]] <- 1, y[[x]] <- 2)}, y = LIST)
lapply(seq_along(LIST), function(x, {left_join(y[[x]], A, by = c("Let"))}, y = LIST)

Where I run into problems is when I attempt to combine the if-else statement and the left join. In particular, I end up with a list consisting of lists, each of which retains only the first column of the original corresponding dataframe.

lapply(seq_along(LIST), function(x, y) {ifelse((x %% 2)==0, left_join(y[[x]], A, by = c("Let")), left_join(y[[x]], B, by = c("Let")))}, y = LIST)

Here is the output I would LIKE to obtain:

[[1]]
  Let Num    Col
1   a   1 Yellow
2   b   2    Red

[[2]]
  Let Num   Col
1   a   1 Green
2   b   2  Blue

I'm certain there's an absurdly simple solution to the problem. Can anyone see it?

Thanks in advance! Matthew

P.S.: I also attempted a second approach, applying subsetting rather than an if-else statement. Again, however, I run into problems. The first line below works as expected, but the second returns an error, as though R doesn't recognize the list indices:

lapply(seq_along(LIST), function(x, y) {left_join(y[[x > 0]], A, by = c("Let"))}, y = LIST)
lapply(seq_along(LIST), function(x, y) {left_join(y[[x == 1]], A, by = c("Let"))}, y = LIST)

Error in y[[x == 1]] : attempt to select less than one element in integerOneIndex 
  • 1
    Can you please edit your post to include your expected output. The first two `lapply` statements throw errors. – Maurits Evers Apr 14 '18 at 15:54
  • Hi Maurits Evers - thank you for your incredibly prompt reply! I have edited the question just as you requested. The first two statements threw errors because I absentmindedly forgot to call dplyr. – M. Stargardter Apr 14 '18 at 16:40
  • As an aside, instead of saying repeatedly that you don't want `stringsAsFactors`, say `library(hellno)` lol – BroVic Apr 14 '18 at 17:00
  • install.packages("hellno") throws an error :p – M. Stargardter Apr 14 '18 at 17:09

5 Answers5

3

I'm not entirely sure I understand your issue.

The following solution is based on a reproduction of the output of lapply(seq_along(LIST), function(x, y) {left_join(y[[x > 0]], A, by = c("Let"))}, y = LIST) from your postscript. Note that the other lapply lines throw errors.

library(tidyverse);
map(list(A, B), function(x) left_join(DF, x))
#Joining, by = "Let"
#Joining, by = "Let"
#[[1]]
#  Num Let    Col
#1   1   a Yellow
#2   2   b    Red
#
#[[2]]
#  Num Let   Col
#1   1   a Green
#2   2   b  Blue

We use purrr:map with dplyr::left_join to join A and B with DF.


The same can be achieved in base R using Map and merge:

mapply(function(x) merge(DF, x, by = "Let"), list(A, B), SIMPLIFY = F)
#[[1]]
#  Let Num    Col
#1   a   1 Yellow
#2   b   2    Red
#
#[[2]]
#  Let Num   Col
#1   a   1 Green
#2   b   2  Blue
Maurits Evers
  • 49,617
  • 4
  • 47
  • 68
  • Hi! This certainly does the trick ... however, the issue would be to extend this approach to a list containing an arbitrarily large number of elements. Thus, if I had a list consisting of 100 items, how would I go about merging the 1st, 3rd, 5th, ... 99th items to dataframe A, and the 2nd, 4th, 6th ... 100th items to dataframe B? – M. Stargardter Apr 14 '18 at 16:49
2

Overview

Use base::mapply() to return a list of data frames which were conditionally merged. Here, I supply two inputs:

  1. seq.along( along.with = LIST ) to obtain the number of elements in LIST; and
  2. LIST itself.

The FUN argument is an anonymous function that takes in two inputs - i and j - and tests whether the current element in LIST is even or odd numbered prior to performing a left-join using base::merge().

If the result of the modulus operator for the ith element in seq.along( along.with = LIST ) is equal to zero, then left-join B onto the jth element in LIST; if it does not equal zero, then perform the left-join A onto the jth element in LIST.

# load data
DF <- data.frame(Num = c("1","2"), Let = c("a","b"), stringsAsFactors = FALSE)
A <- data.frame(Let = c("a","b"), Col = c("Yellow","Red"), stringsAsFactors = FALSE)
B <- data.frame(Let = c("a","b"), Col = c("Green","Blue"), stringsAsFactors = FALSE)
LIST <- list(DF, DF)

# goal: left join all odd elements in LIST[[j]]
#       to `A` and all even elements to `B`
merged.list <- 
  mapply( FUN = function( i, j )
          if( i %% 2 == 0 ){
            merge( x = j
                   , y = B
                   , by = "Let"
                   , all.x = TRUE )
          } else{
            merge( x = j
                   , y = A
                   , by = "Let"
                   , all.x = TRUE )
          }
        , seq_along( along.with = LIST )
        , LIST
        , SIMPLIFY = FALSE )

# view results
merged.list
# [[1]]
# Let Num    Col
# 1   a   1 Yellow
# 2   b   2    Red
# 
# [[2]]
# Let Num   Col
# 1   a   1 Green
# 2   b   2  Blue

# end of script #

Tidyverse approach

The results are replicated below using functions from the purrr and dplyr packages.

library( dplyr )
library( purrr )

merged.list <-
  map2( .x = seq_along( along.with = LIST )
        , .y = LIST
        , .f = function( i, j )
          if( i %% 2 == 0 ){
            left_join( x = j
                       , y = B
                       , by = "Let" )
          } else{
            left_join( x = j
                       , y = A
                       , by = "Let" )
          })

# view results
merged.list
# [[1]]
# Num Let    Col
# 1   1   a Yellow
# 2   2   b    Red
# 
# [[2]]
# Num Let   Col
# 1   1   a Green
# 2   2   b  Blue

# end of script #
Cristian E. Nuno
  • 2,822
  • 2
  • 19
  • 33
  • 1
    Thanks so much for your prompt reply to my query! I think this may actually solve my problem :D - although I still need to play around with the code a bit to verify. For some silly reason, I had it in my head that one can only use mapply if he or she has two lists/vectors of equal length. – M. Stargardter Apr 14 '18 at 17:08
  • @M.Stargardter no problem! In this case, the lengths of `LIST` and `seq.along( LIST )` are equal. However, for cases where this is not true, the unequal input will be recycled. For more on that, see [under what circumstances does R recycle?](https://stackoverflow.com/questions/6555651/under-what-circumstances-does-r-recycle) and [How to define multiple variables with lapply?](https://stackoverflow.com/a/48598720/7954106). – Cristian E. Nuno Apr 14 '18 at 17:16
1

It's possible that MauritsEvers has already answered your question but I thought I would address the obvious errors in R syntax and programming logic. Focus on the first lapply-call:

lapply(seq_along(LIST), function(x, {ifelse((x %% 2)==0, y[[x]] <- 1, y[[x]] <- 2)}, y = LIST)

First and perhaps trivial is the missing closing ) argument list in the first lapply-action. Next and more fundamental is the incorrect use of ifelse as a programming construct. The ifelse function is not designed for serial tests of data-objects. It is only designed to be applied along a single vector. The if(.){.}else{.}-function probably should have been used in that lapply call if serial choices were to be made.

However, (and now trying to implement the first paragraph rather than continue correecting code) I think it would be much simpler to use logical indexing (with R's implicit recycling process) on the LIST-object rather than any looping process. (This is not a tidyverse solution.) This code would segment the LIST into "odd" and "even" components:

  oddList <- LIST[ c(TRUE,FALSE) ]  # implicit seq-along by virtue of recycling
  evenList <- LIST[ c(FALSE,TRUE) ]

We could use that type of results to make two one-liners that would accomplish your stated goals. I made the LIST-object four wide rather than two wide.

Abig <- Reduce( function(x,y) {merge(x,y,by="Let")}, LIST, init=A)
Warning message:
In merge.data.frame(x, y, by = "Let") :
  column names ‘Num.x’, ‘Num.y’ are duplicated in the result
Bbig <- Reduce( function(x,y) {merge(x,y,by="Let")}, LIST, init=B)
Warning message:
In merge.data.frame(x, y, by = "Let") :
  column names ‘Num.x’, ‘Num.y’ are duplicated in the result

That is only a warning and here you can see what it was warning about:

> Abig
  Let    Col Num.x Num.y Num.x Num.y
1   a Yellow     1     1     1     1
2   b    Red     2     2     2     2

If you need those duplicated column names labeled uniquely (and I thin that would be a good idea), then:

names(Abig)[ grep("Num", names(Abig)) ] <- 
                    paste0("Num.", seq_along( grep("Num", names(Abig)) ) )
Abig
  Let    Col Num.1 Num.2 Num.3 Num.4
1   a Yellow     1     1     1     1
2   b    Red     2     2     2     2
IRTFM
  • 258,963
  • 21
  • 364
  • 487
  • 1
    thanks for taking the time to prepare such a detailed response! I particularly appreciate your note regarding the appropriateness of the if-else statement. Others observed this as well, so I will definitely exercise more caution in applying this in the future. – M. Stargardter Apr 15 '18 at 01:33
1

This solution is quite similar to mapply based solution already posted here by (@MauritsEvers & @aspiringurbandatascientist) but it usages a different approach to join data.frames. The dplyr::left_join has been used to fit the purpose.

library(dplyr)
# Using mapply and left_join
mapply(function(x,y){
  if(y %% 2 == 1){
    left_join(x, A, by="Let")
  }else {
    left_join(x, B, by="Let")
  }
}, LIST, seq_along(LIST), SIMPLIFY = FALSE)

# [[1]]
#   Num Let    Col
# 1   1   a Yellow
# 2   2   b    Red
# 
# [[2]]
#   Num Let   Col
# 1   1   a Green
# 2   2   b  Blue
MKR
  • 19,739
  • 4
  • 23
  • 33
  • 1
    It's always helpful to see multiple approaches to solving a problem - thanks so much for taking the time to respond to my question! – M. Stargardter Apr 15 '18 at 01:29
  • 1
    @M.Stargardter You are always welcome. I appreciate question with enough data to reproduce the problem. – MKR Apr 15 '18 at 06:33
0

I reworked a bit your example data for clarity

data

DF1 <- data.frame(Num1 = c("1","2"), Let = c("a","b"), stringsAsFactors = FALSE)
DF2 <- data.frame(Num2 = c("3","4"), Let = c("a","b"), stringsAsFactors = FALSE)
DF3 <- data.frame(Num3 = c("5","6"), Let = c("a","b"), stringsAsFactors = FALSE)
DF4 <- data.frame(Num4 = c("7","8"), Let = c("a","b"), stringsAsFactors = FALSE)
A <- data.frame(Let = c("a","b"), Col = c("Yellow","Red"), stringsAsFactors = FALSE)
B <- data.frame(Let = c("a","b"), Col = c("Green","Blue"), stringsAsFactors = FALSE)
LIST <- list(DF1, DF2, DF3, DF4)

Solution

library(dplyr)
library(purrr)
LIST_odd  <- LIST[as.logical(seq_along(LIST)%%2)]
LIST_even <- LIST[!as.logical(seq_along(LIST)%%2)]
merge_odd <- reduce(LIST_odd,left_join,.init=A) 
#   Let    Col Num1 Num3
# 1   a Yellow    1    5
# 2   b    Red    2    6

merge_even <- reduce(LIST_even,left_join,.init=B)
#   Let   Col Num2 Num4
# 1   a Green    3    7
# 2   b  Blue    4    8

If you don't want to use purrr those give the same result with only dplyr and base:

Reduce(left_join,LIST_odd,A)
Reduce(left_join,LIST_even,B)

Or 100% base:

Reduce(function(x,y) merge(x,y,all.x=TRUE),LIST_odd,A)
Reduce(function(x,y) merge(x,y,all.x=TRUE),LIST_even,B)
moodymudskipper
  • 46,417
  • 11
  • 121
  • 167