I'm trying to read in several CSVs with headers that begin on different rows and then map them into one data frame. I tried the code provided here, but I couldn't get the function to work.
Read CSV into R based on where header begins
Here are two example DFs:
file1 <- structure(list(X..Text = c("# Text", "#", "agency_cd", "5s",
"USGS", "USGS"), X = c("", "", "site_no", "15s", "4294000", "4294000"
), X.1 = c("", "", "datetime", "20d", "6/24/13 0:00", "6/24/13 0:15"
), X.2 = c("", "", "tz_cd", "6s", "EDT", "EDT"), X.3 = c("",
"", "Gage height", "14n", "1.63", "1.59"), X.4 = c("", "", " Discharge",
"14n", "1310", "1250")), class = "data.frame", row.names = c(NA,
-6L))
file2 <- structure(list(X..Text = c("# Text", "# Text", "#", "agency_cd",
"5s", "USGS", "USGS"), X = c("", "", "", "site_no", "15s", "4294002",
"4294002"), X.1 = c("", "", "", "datetime", "20d", "6/24/13 0:00",
"6/24/13 0:15"), X.2 = c("", "", "", "tz_cd", "6s", "EDT", "EDT"
), X.3 = c("", "", "", "Gage height", "14n", "1.63", "1.59"),
X.4 = c("", "", "", " Discharge", "14n", "1310", "1250")), class =
"data.frame", row.names = c(NA,
-7L))
I would like to use a similar solution to the related question I asked above, though I also need to skip the line after the header (header row = row that starts with "agency_cd"), and then do something similar to this to bind all the CSVs into one data frame with the file names in a column:
# Path to the data
data_path <- "Data/folder1/folder2"
# Bind all files together to form one data frame
discharge <-
# Find all file names ending in CSV in all subfolders
dir(data_path, pattern = "*.csv", recursive = TRUE) %>%
# Create a dataframe holding the file names
data_frame(filename = .) %>%
# Read in all CSV files into a new data frame,
# Create a new column with the filenames
mutate(file_contents = map(filename, ~ read_csv(file.path(data_path, .), col_types = cols(.default = "c")))
) %>%
# Unpack the list-columns to make a useful data frame
unnest()
If using the example function provided in the related question above: A) I can't get the header_begins line to give me a vector, and B) I don't know how to then incorporate the function in the read_csv function above.
As a start I tried this using the solution to the related question:
# Function
detect_header_line <- function(file_names, column_name) {
header_begins <- NULL
for(i in 1:length(file_names)){
lines_read <- readLines(file_names[i], warn=F)
header_begins[i] <- grep(column_name, lines_read)
}
}
# Path to the data
data_path <- "Data/RACC_2012-2016/discharge"
# Get all CSV file names
file_names = dir(data_path, pattern = "*.csv", recursive = TRUE)
# Get beginning rows of each CSV file
header_begins <- detect_header_line(file.path(data_path, file_names), 'agency_cd')
But the header_begins vector was empty. And if I can fix that, I still need help getting that incorporated into my code above.
Any help is greatly appreciated!