I am reading a csv file using data.table
package.
Sample dataset:
structure(list(Size = c(4886L, 4096L, 84848L, 518L, 264158L,
725963L, 1340L, 75264L, 198724L), ModifiedTime = c("Jun 11, 2009 06:51:08 PM",
"Aug 21, 2008 03:54:28 PM", "Feb 12, 2007 12:40:00 PM", "Aug 22, 2006 02:12:03 PM",
"Dec 08, 2009 03:28:14 PM", "Sep 29, 2008 03:45:21 PM", "Sep 07, 2011 03:36:54 AM",
"Jul 28, 2011 05:09:58 PM", "Jul 23, 2012 02:25:58 PM"), AccessTime = c("Mar 15, 2013 09:24:53 AM",
"May 12, 2009 04:45:41 PM", "Apr 07, 2014 09:39:03 AM", "Dec 25, 2007 06:48:18 AM",
"Apr 08, 2013 11:52:15 AM", "May 17, 2011 08:48:40 AM", "Mar 12, 2013 02:55:01 AM",
"Jun 07, 2014 04:21:28 PM", "Jan 21, 2013 12:58:07 PM"), contentid = c("000000285b7925f511b3159a72f80a4a",
"0000011afae4d1227c4df57b410ea52c", "000001cec02017ca3eb81ddc4cd1c9ff",
"00000233565d1c17c3135a9504c455ca", "000003020ba74b9d1b6075d3c1b8fcb3",
"0000034b98d29d84ce7b61ee68be7658", "000004ed899e26ae1c9b1ece35a98af1",
"000005a09fd2eb706c5800eb06084160", "0000060b9d552c35f281b5033dcfa1b4"
)), row.names = c(NA, -9L), class = c("data.table", "data.frame"
), .internal.selfref = <pointer: 0x00000232a9fd1ef0>)
While loading I can specify the type of column being read for example:
tble = fread("sample.csv", colClasses = c(Size="numeric", contentid="character"))
My question is:
- While loading itself is it possible to specify how to parse the date column e.g. I know I can convert the date column later with
as.Date(sample$AccessTime, "%b %d, %Y %H:%M:%S %p")
but can I specify this format while loading, so the column is read as datetime column instead of character?
Edit: The intention of specifying this parsing while loading itself, is that I am assuming that this would help the csv to load faster. (Not sure if this is right to assume)
PS: Note I have to use data.table
because my csv file is very large ~ 5GB.