0

I have a large file with more than 20 Million Rows. It has 5 date fields each of which is in a character format like this: "2012-12-31". After importing with a read.table, it gets imported as a 'character' field.

I can convert it to POSIXlt with the following code, but it takes a long time to process. I want to avoid this step and have the date field imported as POSIXlt class directly. Is there a way to do this ?

Other things I have tried is using the colClasses argument in read.table command as a vector of "as.POSIXlt"s as well as "POSIXlt"s. It doesnt seem to work.

input[, c (names (input) %in% c ("DATE1", "DATE2", "DATE3", "DATE4", "DATE5"))] <- sapply(input[, 
  c(names(input) %in% c("DATE1", "DATE2", "DATE3", "DATE4", "DATE5"))], FUN = function(x) as.POSIXlt(as.character(x))})
Selva
  • 2,045
  • 1
  • 23
  • 18
  • First You should POSIXlt: it is a list , use numeric format like Date(more suitable here) or POSIXct. Then, You can use `read.zoo` from zoo package that read your date column as an index. But this depends in the rest of your column and I am not sure that it is efficient for 20 millions rows..I would give a try to `fread` from data.table.. – agstudy Oct 09 '14 at 08:01
  • Perhaps `fread` + `fasttime` is faster than a `read.table` hack. Also relevant: [**this**](http://stackoverflow.com/questions/12898318/convert-character-to-date-quickly-in-r/12898544#12898544) and [**this**](http://stackoverflow.com/questions/18390674/automatically-detect-date-columns-when-reading-a-file-into-a-data-frame) – Henrik Oct 09 '14 at 08:01
  • Thanks @Henrik, I ended up creating a new class and using it to import the character as a POSIXlt date format as shown in bellow code. However it worked with `read.table()` only and not on fread. `setClass('myDate')` `setAs("character","myDate", function(from) as.POSIXlt(fast_strptime(from, "%Y-%m-%d")) )` `colClasses <- c("logical","character", "myDate",)` – Selva Oct 20 '14 at 09:45

0 Answers0