As @user5249203 commented, if you know based on the filename (or something else) before loading that a file has too many columns, then you can skip columns programmatically. If not, continue.
I'm going to assume that you are reading in your files using something like this:
fnames <- list.files(pattern = "*.csv", path = "some/dir")
# replace `read.csv` with whichever function you're using to read in the data
alldata <- sapply(fnames, read.csv, stringsAsFactors = FALSE, simplify = FALSE)
Lacking any files to read like that, I'll generate a fake alldata
list:
set.seed(42)
fnames <- paste0("mtcars", 1:5)
alldata <- sapply(fnames, function(fn) {
if (runif(1) < 0.7) mtcars[,-1] else mtcars
})
# should have 3 with 11 columns, 2 with 10 columns
sapply(alldata, ncol)
# mtcars1 mtcars2 mtcars3 mtcars4 mtcars5
# 11 11 10 11 10
No surprise, we can't rbind them using base R:
do.call("rbind", alldata)
# Error in rbind(deparse.level, ...) :
# numbers of columns of arguments do not match
dplyr
We can however use dplyr::bind_rows
, though it will retain the unwanted column, causing the value of that column in the narrower tables to be NA
:
library(dplyr)
str( bind_rows(alldata) )
# 'data.frame': 160 obs. of 11 variables:
# $ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
# $ cyl : num 6 6 4 6 8 6 8 4 4 6 ...
# $ disp: num 160 160 108 258 360 ...
# $ hp : num 110 110 93 110 175 105 245 62 95 123 ...
# $ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
# $ wt : num 2.62 2.88 2.32 3.21 3.44 ...
# $ qsec: num 16.5 17 18.6 19.4 17 ...
# $ vs : num 0 0 1 1 0 1 0 1 1 1 ...
# $ am : num 1 1 1 0 0 0 0 0 0 0 ...
# $ gear: num 4 4 4 3 3 3 3 4 4 4 ...
# $ carb: num 4 4 1 1 2 1 4 2 2 4 ...
What you don't see in this str
summarization is that some of the mpg
variables are NA
:
table(is.na(bind_rows(alldata)$mpg))
# FALSE TRUE
# 96 64
(Remove it if desired.)
Base R
(Assuming you chose to not use dplyr
). Start from here with your actual list of alldata
:
numColumnsWanted <- 10 # you want this to be 18, I think
alldata2 <- lapply(alldata, function(dat) {
# this grabs the *last* 'numColumnsWanted' columns
if (ncol(dat) > numColumnsWanted) dat[, 1 + ncol(dat) - numColumnsWanted:1] else dat
})
Verify that the data.frames are all the same size. (You probably should also verify the column names:
sapply(alldata2, ncol)
# mtcars1 mtcars2 mtcars3 mtcars4 mtcars5
# 10 10 10 10 10
Now you should be able to rbind them safely:
str( do.call("rbind", alldata2) )
# 'data.frame': 160 obs. of 10 variables:
# $ cyl : num 6 6 4 6 8 6 8 4 4 6 ...
# $ disp: num 160 160 108 258 360 ...
# $ hp : num 110 110 93 110 175 105 245 62 95 123 ...
# $ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
# $ wt : num 2.62 2.88 2.32 3.21 3.44 ...
# $ qsec: num 16.5 17 18.6 19.4 17 ...
# $ vs : num 0 0 1 1 0 1 0 1 1 1 ...
# $ am : num 1 1 1 0 0 0 0 0 0 0 ...
# $ gear: num 4 4 4 3 3 3 3 4 4 4 ...
# $ carb: num 4 4 1 1 2 1 4 2 2 4 ...
($mpg
is not present in this solution.)