1

I would merge 2 data.frame of 2 different lists (by column), if the value of a column is the same. This is my solution, but it is very slow.

for(j in 1:length(s49)){
  for(i in 1:length(s39)){
    if(s39[[i]]$merge[1] == s49[[j]]$merge[1]){ #if che value of column "merge" is the same
      merge(s39[[i]], s49[[j]], by = "merge") # merge the data.frame
    }
  }
}

EDIT

time      lat       lon      callsign   OR   DE ICAOType   merge
1504539460 39.02001 1.482148   JAF6LY EBAW LEIB     E190 EBAW LEIB
1504539475 51.16286 4.521561   JAF6LY EBAW LEIB     E190 EBAW LEIB
1504539497 51.15481 4.502335   JAF6LY EBAW LEIB     E190 EBAW LEIB
1504539519 51.14867 4.482498   JAF6LY EBAW LEIB     E190 EBAW LEIB
1504539541 51.14499 4.455566   JAF6LY EBAW LEIB     E190 EBAW LEIB

time        lat         lon      callsign   OR   DE ICAOType  merge
1504442638 36.72127 -4.42139880   JAF32X EBAW LEMG     E190  EBAW LEIB
1504442653 51.17394  4.54910278   JAF32X EBAW LEMG     E190  EBAW LEIB
1504442675 51.16878  4.57587990   JAF32X EBAW LEMG     E190  EBAW LEIB
1504442697 51.16277  4.60563660   JAF32X EBAW LEMG     E190  EBAW LEIB
1504442719 51.15363  4.63652740   JAF32X EBAW LEMG     E190  EBAW LEIB
1504442741 51.13408  4.64803335   JAF32X EBAW LEMG     E190  EBAW LEIB
1504442763 51.11506  4.62890625   JAF32X EBAW LEMG     E190  EBAW LEIB

First data.frame is part of first list, second data.frame is part of second list. So, the column "merge" has the same value, then I want to merge them. I want to do this for all the dataframes in the lists.

a<-do.call("rbind", s39)
b<-do.call("rbind", s49)
c<-rbind(a,b)
d<-split(c, c$merge)

This is another possibile solution, but i have milions record, and it would be very slow.

Reproducible example:

df <- data.frame(col1 = sample(c(1,2), 10, replace = TRUE),
                    col2 = as.factor(sample(10)), col3 = "a")
df2 <- data.frame(col1 = sample(c(1,2), 10, replace = TRUE),
                  col2 = as.factor(sample(10)), col3 = "b")
df3 <- data.frame(col1 = sample(c(1,2), 10, replace = TRUE),
                    col2 = as.factor(sample(10)), col3 = "c")
my.list <- list(df, df2,df3)

df4 <- data.frame(col1 = sample(c(1,2), 10, replace = TRUE),
                 col2 = as.factor(sample(10)), col3 = "c")
df5 <- data.frame(col1 = sample(c(1,2), 10, replace = TRUE),
                  col2 = as.factor(sample(10)), col3 = "d")
df6 <- data.frame(col1 = sample(c(1,2), 10, replace = TRUE),
                  col2 = as.factor(sample(10)), col3 = "a")
my.list2 <- list(df4, df5,df6)

#this is my solution (slow for milions records)
a<-do.call("rbind", my.list)
b<-do.call("rbind", my.list2)
c<-rbind(a,b)
d<-split(c, c$col3)

Thank your very much

litas
  • 61
  • 1
  • 3
  • Thanks. Please see this link to learn how to make a reproducible example: https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example. – www Feb 17 '18 at 02:52
  • I have retracted my downvote and given you an upvote as promised. But it is still not fully reproducible. – www Feb 17 '18 at 02:58
  • 1
    I add a fully reroducibile example (I hope it goes well). – litas Feb 17 '18 at 03:07
  • Combining and separating factors with different levels is going to cause a lot of warnings (and bad data, if you do it wrong). Unless you've got a good reason to use factors, just keep it as character. – alistaire Feb 17 '18 at 03:14
  • 1
    There are optimized versions of `do.call(rbind, ...)` like `dplyr::bind_rows` and `data.table::rbindlist` that would likely be much more efficient, e.g. `library(dplyr); bind_rows(my.list, my.list2) %>% split(.$col3)` – alistaire Feb 17 '18 at 03:15
  • ...although I suspect this is [an XY problem](https://meta.stackexchange.com/questions/66377/what-is-the-xy-problem). There's not usually a reason to keep a list of data frames unless it itself is within a data frame; grouping is more efficient. – alistaire Feb 17 '18 at 03:18
  • 1
    The data.table version of the above dplyr: `library(data.table); split(rbindlist(c(my.list, my.list2)), by = 'col3')` – alistaire Feb 17 '18 at 03:21
  • @alistaire Thank you very much for your last answer. It is very useful for my purpose, I do my merges in a few minutes. – litas Feb 17 '18 at 03:50

0 Answers0