I have numerous .csv files, that I have save in one folder on my PC. Then, I create a list of theses dataset as follows:
> file_list <- list.files()
> file_list
[1] "ABWAbwut50.csv" "ABWEinfam50.csv" "ABWFeldwaldasph50.csv" "ABWGarage50.csv"
[5] "ABWGemeindestr50.csv" "ABWHotel50.csv" "ABWInd50.csv" "ABWIntflaechen50.csv"
[9] "ABWKantonsstr50.csv" "ABWMehrfam50.csv" "ABWNutzwald50.csv" "ABWSchutzwald50.csv"
[13] "ABWstahlmitvieh50.csv" "ABWStromut50.csv" "ABWWeideland50.csv"
The .csv file sontain identical columns, decimals use .
, columns are separated by ;
. I tried to combine these datasets using following code:
for (file in file_list){
if (!exists("dataset")){
dataset <- read_delim(file, ";", escape_double = FALSE, trim_ws = TRUE)
}
}
dataset
but it only reads the first file. How can I get it to combine all 15 .csv files into one data frame?
when I run different code i got the following error message:
> View(dataset)
> dataset <- do.call("rbind",lapply(file_list,
+ FUN=function(files){read.table(files,
+ header=TRUE, sep=";")}))
Show Traceback
Rerun with Debug
Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, :
line 103 did not have 8 elements
I assume something went wrong and one of the files (actually I know its only couple of rows within a file) has only 7 columns instead of 8. I do not want to be looking into every file separately to try to find if there is some anomaly. How can I have these lines that do not follow the pattern removed automatically?
My datafile looks something like:
> dput(dataset[1:10,])
structure(list(Berechnung = c("EconoMe original", "Berechnung 1",
"Berechnung 2", "Berechnung 3", "Berechnung 4", "Berechnung 5",
"Berechnung 6", "Berechnung 7", "Berechnung 8", "Berechnung 9"
), Situation = c("Nach Massnahme Neue Gerinnefuehrung Gafenbach",
"Nach Massnahme Neue Gerinnefuehrung Gafenbach", "Nach Massnahme Neue Gerinnefuehrung Gafenbach",
"Nach Massnahme Neue Gerinnefuehrung Gafenbach", "Nach Massnahme Neue Gerinnefuehrung Gafenbach",
"Nach Massnahme Neue Gerinnefuehrung Gafenbach", "Nach Massnahme Neue Gerinnefuehrung Gafenbach",
"Nach Massnahme Neue Gerinnefuehrung Gafenbach", "Nach Massnahme Neue Gerinnefuehrung Gafenbach",
"Nach Massnahme Neue Gerinnefuehrung Gafenbach"), NK = c(0, 0,
0, 0, 0, 0, 0, 0, 0, 0), PID = c(2639L, 2639L, 2639L, 2639L,
2639L, 2639L, 2639L, 2639L, 2639L, 2639L), Case = c(1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L), Differenz = c(0, 0, 0, 0, 0, 0,
0, 0, 0, 0), Prozess = c("Murgang", "Murgang", "Murgang", "Murgang",
"Murgang", "Murgang", "Murgang", "Murgang", "Murgang", "Murgang"
), Objektart = c("Abwasser unter Terrain", "Abwasser unter Terrain",
"Abwasser unter Terrain", "Abwasser unter Terrain", "Abwasser unter Terrain",
"Abwasser unter Terrain", "Abwasser unter Terrain", "Abwasser unter Terrain",
"Abwasser unter Terrain", "Abwasser unter Terrain")), .Names = c("Berechnung",
"Situation", "NK", "PID", "Case", "Differenz", "Prozess", "Objektart"
), row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"