I have a file and there are many csv
data in it.
I want to read them and create new columns at one time and then combine to one datatable. I explain more here.
- Look at this pic:
I want to create 2 new columns
YEAR
andMONTH
based on the csv data title.
ex. Take201508 Sales Report(London)
as an example. I want to createYEAR = 2015
andMONTH = 8
.I don't know how to do but I can read them at one time without create new columns.
my_read_data <- function(path){ data <- data.table::fread(path, header = T, strip.white = T, fill = T) data <- data[data[[5]] != 0,] data <- subset(data, select = c(-1,-7,-10,-12,-13,-14,-15,-17)) } file.list <- dir(path = "//path/", pattern='\\.csv', full.names = T) df.list <- lapply(file.list, my_read_data) dt <- rbindlist(df.list)
How to modify my code?
Actually I'm not sure whether my code is correct or not.
Appreciate.
Thanks to @Jaap
, my new code is:
my_read_data <- function(x){
data <- data.table::fread(x, header = T, strip.white = T, fill = T)
data <- data[data[[5]] != 0,]
data <- subset(data, select = c(-1,-7,-10,-12,-13,-14,-15,-17))
}
file.list <- list.files(path = "/path/", pattern = '*.csv')
dt.list <- sapply(file.list, my_read_data, simplify=FALSE)
However, I get an error.
Error in data.table::fread(x, header = T, strip.white = T, fill = T) :
File not found: C:\Users\PECHEN\AppData\Local\Temp\RtmpiihFR4\filea0c4d726488
In addition: Warning messages:
1: running command 'C:\Windows\system32\cmd.exe /c (TWM-201508 Sales Report(London).csv) > C:\Users\PECHEN\AppData\Local\Temp\RtmpiihFR4\filea0c4d726488' had status 1
2: In shell(paste("(", input, ") > ", tt, sep = "")) :
'(TWM-201508 Sales Report(London).csv) > C:\Users\PECHEN\AppData\Local\Temp\RtmpiihFR4\filea0c4d726488' execution failed with error code 1
Moreover, I edit my code:
my_read_data <- function(x){
data <- data.table::fread(x, header = T, strip.white = T, fill = T)
data <- data[data[[5]] != 0,]
data <- subset(data, select = c(-1,-7,-10,-12,-13,-14,-15,-17))
}
file.list <- dir(path = "/path/", pattern='\\.csv', full.names = T)
df.list <- lapply(file.list, my_read_data)
dt <- rbindlist(df.list, idcol = 'id')[, `:=` (YEAR = substr(id,5,8), MONTH = substr(id,9,10))]
I use YEAR = substr(id,5,8), MONTH = substr(id,9,10)
since each data title has four charater before numbers. ex. AAA-201508Sales Report
However, it doesn't work.
Thanks to @Peter TW
, it works.