My code snippet and time taken data is as below.
any suggestions and alternate options on how to reduce the below to less than a minute maximum.
##########RUN FROM r 64bit windows 10###########################
> #automation to import large clog data into R
> gc()
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 363072 19.4 592000 31.7 460000 24.6
Vcells 6672707 51.0 10309224 78.7 7293876 55.7
> memory.limit(size=20000)
[1] 20000
> library(data.table)
data.table 1.10.4
The fastest way to learn (by data.table authors): https://www.datacamp.com/courses/data-analysis-the-data-table-way
Documentation: ?data.table, example(data.table) and browseVignettes("data.table")
Release notes, videos and slides: http://r-datatable.com
> DT <- fread("C:/CLOG-BIG-DATA-PROJECT/WestBengal_0000.txt",sep=",",header=FALSE,
showProgress = TRUE,verbose=TRUE )
###############################################################
#output#########################################################
**17502188 rows and 64 (of 64) columns from 7.143 GB file in 00:17:38**
Read 17502188 rows. Exactly what was estimated and allocated up front
0.000s ( 0%) Memory map (rerun may be quicker)
0.000s ( 0%) sep and header detection
18.283s ( 2%) Count rows (wc -l)
0.000s ( 0%) Column type detection (100 rows at 10 points)
19.296s ( 2%) Allocation of 17502188x64 result (xMB) in RAM
**1019.676s ( 93%) Reading data**
0.107s ( 0%) Allocation for type bumps (if any), including gc time if triggered
0.048s ( 0%) Coercing data already read in type bumps (if any)
39.639s ( 4%) Changing na.strings to NA
**1097.049s Total**