-2

I have a List of 366 Dataframes, each DG contains 3 Columns, i.e; "i", "j" and "Value". I want to merge these data frames in a single data frame to do statistical analysis, like mean, mode, median. each list contains almost the same no. observations?

  • 1
    Since they do not have the same observations, perhaps you mean to combine them so that you have one frame with three columns, is that right? Perhaps you want a fourth column to indicate which frame they originally belongs to? – r2evans Feb 17 '20 at 21:23
  • https://stackoverflow.com/questions/2851327/convert-a-list-of-data-frames-into-one-data-frame might be what you are looking for. – Ronak Shah Feb 18 '20 at 00:28

2 Answers2

1

Base R options:

set.seed(42)
listdat <- replicate(3, data.frame(i=sample(100, size=2), j=sample(100, size=2), Value=sample(100, size=2)), simplify = FALSE)
str(listdat)
# List of 3
#  $ :'data.frame': 2 obs. of  3 variables:
#   ..$ i    : int [1:2] 92 93
#   ..$ j    : int [1:2] 29 83
#   ..$ Value: int [1:2] 65 52
#  $ :'data.frame': 2 obs. of  3 variables:
#   ..$ i    : int [1:2] 74 14
#   ..$ j    : int [1:2] 66 70
#   ..$ Value: int [1:2] 46 72
#  $ :'data.frame': 2 obs. of  3 variables:
#   ..$ i    : int [1:2] 94 26
#   ..$ j    : int [1:2] 47 94
#   ..$ Value: int [1:2] 98 12

Starting with that, the first thing we can do is just combine them row-wise, all in one go:

do.call(rbind, listdat)
#    i  j Value
# 1 92 29    65
# 2 93 83    52
# 3 74 66    46
# 4 14 70    72
# 5 94 47    98
# 6 26 94    12

It might be nice to include which index they came from. If they are not named, then you can just include the index number:

do.call(rbind, Map(cbind, listdat, num=seq_along(listdat)))
#    i  j Value num
# 1 92 29    65   1
# 2 93 83    52   1
# 3 74 66    46   2
# 4 14 70    72   2
# 5 94 47    98   3
# 6 26 94    12   3

If they have names, however, we can use the same technique:

names(listdat) <- c("A","B","C")
do.call(rbind, Map(cbind, listdat, name=names(listdat)))
#      i  j Value name
# A.1 92 29    65    A
# A.2 93 83    52    A
# B.1 74 66    46    B
# B.2 14 70    72    B
# C.1 94 47    98    C
# C.2 26 94    12    C

Per @akrun's commented suggestion, here are two external-package suggestions that are a bit shorter.

# 'dplyr'
dplyr::bind_rows(listdat)                      # if no names present
dplyr::bind_rows(listdat, .id = 'name')        # with names
# 'data.table'
data.table::rbindlist(listdat)                 # if no names present
data.table::rbindlist(listdat, idcol = 'name') # with names
r2evans
  • 141,215
  • 6
  • 77
  • 149
  • 1
    Or if it is `dplyr` `bind_rows(lstdat,.id = 'name')` or may be it is more efficient with `rbindlist(lstdat, idcol = 'name')` as the OP have lots of datasets – akrun Feb 17 '20 at 21:38
  • Yes, I thought about that, I just didn't have time up-front to demo those admittedly much-shorter snippets. Thanks! – r2evans Feb 17 '20 at 21:55
0

Assuming the data sets are in your working directory & have some unique identifier in filename (e.g. "dataset": "dataset1.csv", "dataset2.csv", "dataset3.csv", etc...), and you don't mind using tidyverse, the following should work:

library(tidyverse)

file_names <- list.files() %>%
   str_extract(., "dataset")

my_df <- map(file_names, ~ read_csv(.x)) %>% bind_rows()

Adam B.
  • 788
  • 5
  • 14
  • Please consider including only the packages that you need. `tidyverse` is a gargantuan meta-package that not everybody has installed or can install. (I have several computers where I cannot install arbitrary packages.) In this case, we only need `dplyr`, `purrr`, `stringr`, and `readr`. Just like we encourage questions to be specific on listing non-base packages, it is also a courtesy to provide the same level of specificity in our answers. Thanks! – r2evans Feb 17 '20 at 21:30
  • 1
    Duly noted. I'm a bit of a `tidyverse` partizan and I think it's especially helpful for new users just starting out with R because it's a lot more human friendly than base R (which I myself started learning on), and for the average, non-expert user who just wants to run some analyses on their laptop & not have to worry about things like optimization, tidyverse is often a life-saver. I'll add an addendum though. – Adam B. Feb 17 '20 at 21:36
  • 1
    I'm not arguing against the use of tidyverse packages (that's a completely separate topic), I'm just commenting on the fact that not everybody has tidyverse installed, and that's quite a long/big ordeal if you do it "just" to try this answer. That's all, thanks! – r2evans Feb 17 '20 at 21:54