I want to read in the first 3 columns from many files where I don't necessarily know the number of columns each file contains. Additionally, I don't exactly know the number of lines to skip in each file, though it won't be more than 19 before the header line.
My question is similar to these questions:
- Only read limited number of columns
- Ways to read only select columns from a file into R? (A happy medium between `read.table` and `scan`?).
But I have the different problem of not knowing the number of columns in the files I want to import or the exact number of rows to skip. I only want to import the first three columns from every file, which are consistently named (Date/Time
,Unit
,Value
).
The read.table
solutions to the linked questions require knowing the number of columns in your file and specifying the colClasses
for each column. I am attempting to read thousands of files via an approach with lapply
, where the input is a list of .csv files, and use read.table
on each file:
lapply(files, read.table, skip=19, header=T, sep=",")
# 2ndary issue: # of lines to skip varies. maybe up to 19.
Is there a way of getting around the problem of not knowing the number of columns ahead of time?
EDIT: I have modified the answer provided by @asb to suit my problem and it works perfectly.
my.read.table <- function (file, sep=",", colClasses3=c("factor","factor","numeric"), ...) {
## extract the first line of interest, the line where "Date/Time,Unit,Value" appears
first.line <- readLines(file, n=20)[grepl("Date/Time,Unit,Value",
readLines(file, n = 20)) == T]
## deteremine number of lines to skip (max possible = 19)
skip.number <- grep("Date/Time,Unit,Value",
readLines(file, n=20), value=FALSE)-1
## Split the first line on the separator to find # of columns
ncols <- length(strsplit(first.line, sep, fixed=TRUE)[[1]])
## fixed=TRUE to avoid needing to escape the separator.
# use ncols here in the `colClasses` argument
out <- read.table(file, sep=sep, header=TRUE, skip = skip.number,
colClasses=c(colClasses3, rep("NULL", ncols - 3)), ...)
out
}