How to join multiples tables with different length and add an unique ID

Question

I have a folder with multiples files .txt ("qc11025010.txt" "qc11035010.txt" "qc11035020.txt" "qc11045010.txt"....) with meteorological data of various stations. Each of these files have 6 columns: Year, month, day, precipitation, maximum temperature and minimum temperature,and different lenght of rows

files[1]
    V1 V2 V3  V4    V5    V6
1 1983  3  5  92.0 -99.9 -99.9
2 1983  3  6 141.0 -99.9  23.0
3 1983  3  7  61.3 -99.9  18.6
4 1983  3  8  10.7 -99.9 -99.9
5 1983  3  9   0.0 -99.9 -99.9
6 1983  3 10   0.0 -99.9 -99.9

files[2]
    V1 V2 V3   V4    V5    V6
1 1983  3 15  0.6 -99.9 -99.9
2 1983  3 16 29.4  33.8  24.8
3 1983  3 17 23.2  28.0 -99.9
4 1983  3 18  0.6 -99.9  23.0
5 1983  3 19  0.5  33.8  23.4
6 1983  3 20  0.0  33.2  22.2

library(dplyr)

files <- list.files(path = folder, pattern = "txt")

dat <- read.table(files[1])
dat$ID <- rep(as.character(files[1])) 

for (x in files[2:278]){
  tb <- lapply(x, read.table, header=F)
  tb$ID <- rep(as.character(x))
  res <- rbind(datos, tb)
  colnames(res) <- c("YEAR","MONTH","DAY","PCP","TMAX","TMIN", "ID")

}

Then, I obtain Error in rbind(deparse.level, ...) : invalid list argument: all variables should have the same length

I want to join the tables and add a unique ID of the following form

 YEAR MONTH DAY  PCP TMAX  TMIN  ID
 1983  3     5  92.0 -99.9 -99.9  qc11025010.txt
 1983  3     6 141.0 -99.9  23.0  qc11025010.txt
 1983  3     7  61.3 -99.9  18.6  qc11025010.txt
.....
 1983  3    15  0.6 -99.9 -99.9   qc11045010.txt
 1983  3    16 29.4  33.8  24.8   qc11045010.txt
 1983  3    17 23.2  28.0 -99.9   qc11045010.txt

what is `datos` in the `rbind` call? it is not defined in your code. in general you should use an `apply` or `map` function to do this kind of thing anyway instead of looping through objects in your global environment — Calum You, Feb 16 '19 at 00:11
Here's a nice tutorial on how to do this more simply with `purrr`, including a way to include the source file name: https://serialmentor.com/blog/2016/6/13/reading-and-combining-many-tidy-data-files-in-R — Jon Spring, Feb 16 '19 at 00:22

score 1 · Accepted Answer · answered Feb 16 '19 at 07:08

Loading in multiple files into a single list and then concatenating them row-wise is easily done with lapply and/or Map, then a wrap-up rbind.

Setup: I'll create three files in the current directory:

ign <- sapply(1:3, function(i) write.csv(mtcars[1:3, 1:4], file=paste0(i, ".csv"), 
row.names=FALSE))
lof <- list.files(path=".", pattern="*.csv", full.names=TRUE)
lof
# [1] "./1.csv" "./2.csv" "./3.csv"

Now using that data, we first load in the data:

dat <- lapply(lof, read.csv)
dat[1]
# [[1]]
#    mpg cyl disp  hp
# 1 21.0   6  160 110
# 2 21.0   6  160 110
# 3 22.8   4  108  93

and then column-bind the filenames to each one, in a "zipper"-like fashion, with Map:

Map(cbind, dat, filename = basename(lof))
# [[1]]
#    mpg cyl disp  hp filename
# 1 21.0   6  160 110    1.csv
# 2 21.0   6  160 110    1.csv
# 3 22.8   4  108  93    1.csv
# [[2]]
#    mpg cyl disp  hp filename
# 1 21.0   6  160 110    2.csv
# 2 21.0   6  160 110    2.csv
# 3 22.8   4  108  93    2.csv
# [[3]]
#    mpg cyl disp  hp filename
# 1 21.0   6  160 110    3.csv
# 2 21.0   6  160 110    3.csv
# 3 22.8   4  108  93    3.csv

We can do this new column and row-bind them in one step, using

do.call("rbind.data.frame", c(Map(cbind, dat, filename = basename(lof)), stringsAsFactors = FALSE))
#    mpg cyl disp  hp filename
# 1 21.0   6  160 110    1.csv
# 2 21.0   6  160 110    1.csv
# 3 22.8   4  108  93    1.csv
# 4 21.0   6  160 110    2.csv
# 5 21.0   6  160 110    2.csv
# 6 22.8   4  108  93    2.csv
# 7 21.0   6  160 110    3.csv
# 8 21.0   6  160 110    3.csv
# 9 22.8   4  108  93    3.csv

using stringsAsFactors=FALSE so that we don't have to deal with (incompatible) factors. This could just as easily been done with data.table::rbindlist or dplyr::bind_rows.

How to join multiples tables with different length and add an unique ID

1 Answers1