When I run mclapply:
> ListofCSVs<- mclapply(list.files(pattern = "2013"), function(n){
read.table(n, header=TRUE, sep = ",", stringsAsFactors = FALSE
)}
,mc.cores=12)
Where list.files(pattern = "2013") lists 12 CSV files:
> list.files(pattern = "2013")
[1] "BONDS 2013 01.csv" "BONDS 2013 02.csv" "BONDS 2013 03.csv" "BONDS 2013 04.csv" "BONDS 2013 05.csv" "BONDS 2013 06.csv" "BONDS 2013 07.csv"
[8] "BONDS 2013 08.csv" "BONDS 2013 09.csv" "BONDS 2013 10.csv" "BONDS 2013 11.csv" "BONDS 2013 12.csv"
I get:
Warning message:
In mclapply(list.files(pattern = ".csv"), function(n) { :
**scheduled core 2, 1 encountered error** in user code, all values of the job will be affected.
print(ListofCSVs[1]) ....."fatal error in wrapper code"
I have tried this , but my data is still not loading correctly.
This says that it may be a problem of too many threads....
I can load the files correctly with lapply.
I have also checked that each read.table works, hence, I do not think it is a data issue.
ai<-read.table("BONDS 2013 i.csv", header=TRUE, sep = ",", stringsAsFactors = FALSE)
Each CSV is about 1G with 40 Columns.
I also use foreach %dopar% and it works.
I have run the same code with two cores. It doesnt work.
I have run it with one core and it works.
The data is in the working directory
Thanks!
I have 16 cores and 122 GB (Amazon Cloud AWS - Linux)
UPDATE: This works...
> ListofCSVs<- parLapply(cl, list.files(pattern = ".csv"), function(n){
read.table(n, header=TRUE, sep=',',stringsAsFactors = FALSE)
})
go figure....