1

I have a data.table score of size 900Mb. There is a column datetime which is essentially a datetime in the format ("2018-05-25 10:10:53:000000"). I am trying to convert the class character of this particular column to POSIXlt using the following code:

score[,newdate := as.POSIXlt.character(score[["datetime"]],tz="IST",format="%Y-%m-%d %H:%M:%S")][,datetime:=NULL]

This operation renders a data.table of size 211 GB. What is happening here. Please help.

dput(head(score))

structure(list(id1 = c(12234398L, 323437283L, 12343344L, 
545465653L, 312342343L, 22344232L), id2 = c(216231535L, 
324345453L, 345474698L, 87787950L, 656565531L, 565656657L), 
Score = c(756L, 777L, 788L, 234L, 656L, 788L), datetime = c("2017-05-08 00:00:00.0000000", 
"2018-07-12 01:24:46.0000000", "2015-16-02 00:00:00.0000000", 
"2016-03-22 23:06:45.0000000", "2016-07-14 12:23:45.0000000", 
"2014-05-03 03:33:13.0000000")), .Names = c("id1", 
"id2", "Score", "datetime"), class = c("data.table", 
"data.frame"), row.names = c(NA, -6L), .internal.selfref = <pointer: 
0x190cc98>)
tushaR
  • 3,083
  • 1
  • 20
  • 33

2 Answers2

2

Referring to this link

data.table doesn't accept POSIXlt columns. You can use POSIXct instead of POSIXlt.

score[, newdate := as.POSIXct(datetime, tz = "IST", format = "%Y-%m-%d %H:%M:%S")][, datetime := NULL]
zacdav
  • 4,603
  • 2
  • 16
  • 37
1

To explain, why the data.table is growing so much:

d <- as.POSIXlt.character("2017-05-08 00:00:00.0000000", ,tz="IST",format="%Y-%m-%d %H:%M:%S")
d
object.size(d) # 2024 Bytes in my configuration
# 2024 bytes

?POSIXlt says that

Class "POSIXlt" is a named list of vectors representing...

This means that every single POSIXlt object consist of many elements represeting the components of a date and time.

This costs a lot of memory (can't remember, but about 80 bytes per POSIXlt element + overhead for the vector structure).

The assignment operator of data.table has a special handling of list (each list element is assigned to a different column) so you get a warning with your code snippet like:

Warning message: In [.data.table(data, , :=(newdate, as.POSIXlt.character(data[["datetime"]], : Supplied 11 items to be assigned to 6 items of column 'newdate' (5 unused)

R Yoda
  • 8,358
  • 2
  • 50
  • 87
  • 1
    Thanks for the explanation. I read somewhere in one of the links that @zacdav shared that each POSIXlt element takes around 40 bytes of memory. – tushaR May 25 '18 at 06:32