-1

I'm a total noob at R and I've tried (and retried) to search for an answer to the following problem, but I've not been able to get any of the proposed solutions to do what I'm interested in.

I have two lists of named elements, with each element pointing to data frames with identical layouts:

(EDIT)

df1 <- data.frame(A=c(1,2,3),B=c("A","B","C"))
df2 <- data.frame(A=c(98,99),B=c("Y","Z"))
lst1 <- c(X=df1,Y=df2)
df3 <- data.frame(A=c(4,5),B=c("D","E"))
lst2 <- c(X=df3)

(EDIT 2)

So it seems like storing multiple data frames in a list is a bad idea, as it will convert the data frames to lists. So I'll go out looking for an alternative way to store a set of named data frames.

In general the names of the elements in the two elements might overlap partially, completely, or not at all.

I'm looking for a way to merge the two lists into a single list:

<some-function-sequence>(lst1, lst2)
->
c(X=rbind(df1,df3),Y=df2)

-resulting in something like this:

[EDIT: Syntax changed to correctly reflect desired result (list-of-data frames)] $X A B 1 1 A 2 2 B 3 3 C 4 4 D 5 5 E

$X.B
   A B
1 98 Y
2 99 Z

I.e:

  • IF the lists contain identical element names, each pointing to a data frame, THEN I want to 'rbind' the rows from these two data frames and assign the resulting data frame to the same element name in the resulting list.
  • Otherwise the element names and data frames from both lists should just be copied into the resulting list.

I've tried the solutions from a number of discussions such as:

-but I've not been able to find the right solution. A general problem seems to be that the data frame ends up being converted into a list by the application of 'mapply/sapply/merge/...' - and usually also sliced and/or merged in ways which I am not interested in. :)

Any help with this will be much appreciated!

[SOLUTION] The solution seems to be to change the use of c(...) when collecting data frames to list(...) after which the solution proposed by Pierre seems to give the desired result.

Community
  • 1
  • 1
RBA
  • 3
  • 4
  • Take some time to create a few example data frames. The pseudo code is not specific enough to describe the inner structure of what you are working with. `c(a=,..` will break the data frame up into list elements for each column. I doubt that you're real data reflects this. – Pierre L Mar 10 '16 at 17:27
  • If you have the data stored in lists as in `list(a=df1, b=df2)` then you can use a split and rbind `lapply(split(c(lst1, lst2), names(c(lst1,lst2))), function(lst) do.call(rbind, lst))` – Pierre L Mar 10 '16 at 17:30
  • Thanks for your reply Pierre. I've tried your suggestion above, but has not been able to get it to work. I've added some example data to my question and the result of running your command on it. – RBA Mar 11 '16 at 08:47
  • Please read my suggestion again. "If you have the data stored in **LISTS** as in `LIST!!!(a=df1, b=df2)`". Did you bother reading that part at all?? Look at your code and tell me if you have that. Why would you use my solution when you do not have the data in the form that I asked? How do you expect it to work? – Pierre L Mar 11 '16 at 09:35
  • Are you aware of what `c(df1, df2)` does to data frames? It breaks the data frame up. – Pierre L Mar 11 '16 at 09:38
  • I don't want to get into a discussion of semantics on this, but in my (admittedly naive) understanding of R I would say that my data was stored in lists, but I see now that you talk about lists-of-lists whereas I talked about lists-of-data frames. So I guess that means that your suggested solution doesn't apply to my problem, but thanks anyway for your time. – RBA Mar 11 '16 at 09:53
  • Do you need factors? Are you aware of what they are? R has a default conversion of character strings like `"A"` to class `factor`. They are used for categorical data in models. Did you mean to create "A" as a factor or do you just need the letter "A" as is? – Pierre L Mar 11 '16 at 10:34
  • I'm aware of what factors are. In my data sets I have some text columns but they are not of interested so I actually don't care if they are created as factors or strings. I added the second column with letters primarly to make it easier to distinguish them in the desired result. – RBA Mar 11 '16 at 11:02
  • in that case my answer below should work – Pierre L Mar 11 '16 at 11:05
  • why did you change your output after I answered? Why did you say before that you wanted a list as output? Now you want data frames? – Pierre L Mar 11 '16 at 11:06
  • After I became aware of the "problem" with collecting data-frames with `c(...)` I came across this article [link](http://stackoverflow.com/questions/17499013/how-do-i-make-a-list-of-data-frames) showing how to use `list(...)` instead of `c(...)` to collect the data frames. With that small change your solutions seems to produce the desired result. – RBA Mar 11 '16 at 11:11

2 Answers2

0

The following solution is probably not the most efficient way. However, if I got your problem right this should work ;)

# Example data

# Some vectors
a <- 1:5
b <- 3:7
c <- rep(5, 5)
d <- 5:1

# Some dataframes, data1 and data3 have identical column names
data1 <- data.frame(a, b)
data2 <- data.frame(c, b)
data3 <- data.frame(a, b)
data4 <- data.frame(c, d)

# 2 lists
list1 <- list(data1, data2)
list2 <- list(data3, data4)

# Loop, wich checks for the dataframe names and rbinds dataframes with the same column names
final_list <- list1
used_lists <- numeric()

for(i in 1:length(list1)) {

    for(j in 1:length(list2)) {

    if(sum(colnames(list1[[i]]) == colnames(list2[[j]])) == ncol(list1[[i]])) {

      final_list[[i]] <- rbind(list1[[i]], list2[[j]])
      used_lists <- c(used_lists, j)

    }
  }
}

# Adding the other dataframes, which did not have the same column names
for(i in 1:length(list2)) {

  if((i %in% used_lists) == FALSE) {

    final_list[[length(final_list) + 1]] <- list2[[i]]

  }
}

# Final list, which includes all other lists
final_list
Joachim Schork
  • 2,025
  • 3
  • 25
  • 48
  • Thanks for the reply. unfortunately I'm getting the following result when running your code on my example data: `list1 <- lst1 list2 <- lst2 (...) Error in if (sum(colnames(list1[[i]]) == colnames(list2[[j]])) == ncol(list1[[i]])) { : argument is of length zero` – RBA Mar 11 '16 at 09:31
  • If you want to create a list of dataframes you have to write list(df1, df2) instead of c(df1, df2). If you do that my code should work. – Joachim Schork Mar 11 '16 at 11:55
0

Here is a proposed solution using split and c to combine like terms. Please read the caveat at the bottom:

s <- split(c(lst1, lst2), names(c(lst1,lst2))) 
lapply(s, function(lst) do.call(function(...) unname(c(...)), lst))
# $X.A
# [1] 1 2 3 4 5
# 
# $X.B
# [1] "A" "B" "C" "D" "E"
# 
# $Y.A
# [1] 98 99
# 
# $Y.B
# [1] "Y" "Z"

This solution is based on NOT having factors as strings. It will not throw an error but the factors will be converted to numbers. Below I show how I transformed the data to remove factors. Let me know if you require factors:

df1 <- data.frame(A=c(1,2,3),B=c("A","B","C"), stringsAsFactors=FALSE)
df2 <- data.frame(A=c(98,99),B=c("Y","Z"), stringsAsFactors=FALSE)
lst1 <- c(X=df1,Y=df2)
df3 <- data.frame(A=c(4,5),B=c("D","E"), stringsAsFactors=FALSE)
lst2 <- c(X=df3)

If the data is stored in lists we can use:

lapply(split(c(lst1, lst2), names(c(lst1,lst2))), function(lst) do.call(rbind, lst))
Pierre L
  • 28,203
  • 6
  • 47
  • 69