Combing Dataframes in a list by two columns "i" and "j" to do staistical analysis

Question

I have a List of 366 Dataframes, each DG contains 3 Columns, i.e; "i", "j" and "Value". I want to merge these data frames in a single data frame to do statistical analysis, like mean, mode, median. each list contains almost the same no. observations?

Since they do not have the same observations, perhaps you mean to combine them so that you have one frame with three columns, is that right? Perhaps you want a fourth column to indicate which frame they originally belongs to? — r2evans, Feb 17 '20 at 21:23
https://stackoverflow.com/questions/2851327/convert-a-list-of-data-frames-into-one-data-frame might be what you are looking for. — Ronak Shah, Feb 18 '20 at 00:28

r2evans · Answer 1 · 2020-02-17T21:57:13.650

Base R options:

set.seed(42)
listdat <- replicate(3, data.frame(i=sample(100, size=2), j=sample(100, size=2), Value=sample(100, size=2)), simplify = FALSE)
str(listdat)
# List of 3
#  $ :'data.frame': 2 obs. of  3 variables:
#   ..$ i    : int [1:2] 92 93
#   ..$ j    : int [1:2] 29 83
#   ..$ Value: int [1:2] 65 52
#  $ :'data.frame': 2 obs. of  3 variables:
#   ..$ i    : int [1:2] 74 14
#   ..$ j    : int [1:2] 66 70
#   ..$ Value: int [1:2] 46 72
#  $ :'data.frame': 2 obs. of  3 variables:
#   ..$ i    : int [1:2] 94 26
#   ..$ j    : int [1:2] 47 94
#   ..$ Value: int [1:2] 98 12

Starting with that, the first thing we can do is just combine them row-wise, all in one go:

do.call(rbind, listdat)
#    i  j Value
# 1 92 29    65
# 2 93 83    52
# 3 74 66    46
# 4 14 70    72
# 5 94 47    98
# 6 26 94    12

It might be nice to include which index they came from. If they are not named, then you can just include the index number:

do.call(rbind, Map(cbind, listdat, num=seq_along(listdat)))
#    i  j Value num
# 1 92 29    65   1
# 2 93 83    52   1
# 3 74 66    46   2
# 4 14 70    72   2
# 5 94 47    98   3
# 6 26 94    12   3

If they have names, however, we can use the same technique:

names(listdat) <- c("A","B","C")
do.call(rbind, Map(cbind, listdat, name=names(listdat)))
#      i  j Value name
# A.1 92 29    65    A
# A.2 93 83    52    A
# B.1 74 66    46    B
# B.2 14 70    72    B
# C.1 94 47    98    C
# C.2 26 94    12    C

Per @akrun's commented suggestion, here are two external-package suggestions that are a bit shorter.

# 'dplyr'
dplyr::bind_rows(listdat)                      # if no names present
dplyr::bind_rows(listdat, .id = 'name')        # with names
# 'data.table'
data.table::rbindlist(listdat)                 # if no names present
data.table::rbindlist(listdat, idcol = 'name') # with names

Or if it is `dplyr` `bind_rows(lstdat,.id = 'name')` or may be it is more efficient with `rbindlist(lstdat, idcol = 'name')` as the OP have lots of datasets — akrun, Feb 17 '20 at 21:38
Yes, I thought about that, I just didn't have time up-front to demo those admittedly much-shorter snippets. Thanks! — r2evans, Feb 17 '20 at 21:55

Adam B. · Answer 2 · 2020-02-17T21:37:12.837

0

Assuming the data sets are in your working directory & have some unique identifier in filename (e.g. "dataset": "dataset1.csv", "dataset2.csv", "dataset3.csv", etc...), and you don't mind using tidyverse, the following should work:

library(tidyverse)

file_names <- list.files() %>%
   str_extract(., "dataset")

my_df <- map(file_names, ~ read_csv(.x)) %>% bind_rows()

edited Feb 17 '20 at 21:37

answered Feb 17 '20 at 21:26

Adam B.

788
5
14

Please consider including only the packages that you need. `tidyverse` is a gargantuan meta-package that not everybody has installed or can install. (I have several computers where I cannot install arbitrary packages.) In this case, we only need `dplyr`, `purrr`, `stringr`, and `readr`. Just like we encourage questions to be specific on listing non-base packages, it is also a courtesy to provide the same level of specificity in our answers. Thanks! – r2evans Feb 17 '20 at 21:30
1

Duly noted. I'm a bit of a `tidyverse` partizan and I think it's especially helpful for new users just starting out with R because it's a lot more human friendly than base R (which I myself started learning on), and for the average, non-expert user who just wants to run some analyses on their laptop & not have to worry about things like optimization, tidyverse is often a life-saver. I'll add an addendum though. – Adam B. Feb 17 '20 at 21:36
1

I'm not arguing against the use of tidyverse packages (that's a completely separate topic), I'm just commenting on the fact that not everybody has tidyverse installed, and that's quite a long/big ordeal if you do it "just" to try this answer. That's all, thanks! – r2evans Feb 17 '20 at 21:54

Combing Dataframes in a list by two columns "i" and "j" to do staistical analysis

2 Answers2