3

After reading in a large data set with read.csv.ffdf, one of the columns is time. Such as 2014-10-18 00:01:02, for 1 million rows in that column. That column is a factor. How do I convert it to POSIXct supported by ff? Simply using as.POSIXct() just turns the values into NA

Or when I read in the data set in the beginning, can I specify that column to be POSIXct?

My goal is to get the month and days (or even hour). So I'm open to solutions other than converting to POSIXct.

For example, we have 9 by 2 table,

test <- read.csv.ffdf(file="test.csv", header=T, first.rows=-1)

Two columns are ID (numeric class), and time (factor class)

Here is dput

structure(list(virtual = structure(list(VirtualVmode = c("integer", 
"integer"), AsIs = c(FALSE, FALSE), VirtualIsMatrix = c(FALSE, 
FALSE), PhysicalIsMatrix = c(FALSE, FALSE), PhysicalElementNo = 1:2, 
    PhysicalFirstCol = c(1L, 1L), PhysicalLastCol = c(1L, 1L)), .Names = c("VirtualVmode", 
"AsIs", "VirtualIsMatrix", "PhysicalIsMatrix", "PhysicalElementNo", 
"PhysicalFirstCol", "PhysicalLastCol"), row.names = c("ID", "time"
), class = "data.frame", Dim = c(9L, 2L), Dimorder = 1:2), physical = structure(list(
    ID = structure(list(), physical = <pointer: 0x000000000821ab20>, virtual = structure(list(), Length = 9L, Symmetric = FALSE), class = c("ff_vector", 
    "ff")), time = structure(list(), physical = <pointer: 0x000000000821abb0>, virtual = structure(list(), Length = 9L, Symmetric = FALSE, Levels = c("10/17/2003 0:01", 
    "12/5/1999 0:02", "2/1/2000 0:01", "3/23/1998 0:01", "3/24/2013 0:00", 
    "5/29/2004 0:00", "5/9/1985 0:01", "6/14/2010 0:01", "6/25/2008 0:02"
    ), ramclass = "factor"), class = c("ff_vector", "ff"))), .Names = c("ID", 
"time")), row.names = NULL), .Names = c("virtual", "physical", 
"row.names"), class = "ffdf")
MM Cui
  • 51
  • 6
  • 1
    Please provide a small sample of your data with the output of `dput(head(data))` – Rich Scriven Oct 18 '14 at 16:56
  • For the factor conversion, you'll need to do an `as.character` on the column first. Then you can pass that into `as.POSIXct`. – hrbrmstr Oct 18 '14 at 17:26
  • It seems that after applying as.character, the column is still factor class. I think the problem is that ff doesn't support character.... maybe I'm mistaken... – MM Cui Oct 18 '14 at 17:48
  • K forget the `dput` we can't use it because of the pointer. My fault – Rich Scriven Oct 18 '14 at 18:04

2 Answers2

1

You can use with from package ffbase as shown below on a toy example. Best.

require(ff)
x <- data.frame(id = 1:100000, timepoint = seq(from = Sys.time(), by = "sec", length.out = 100000))
x$timepoint <- as.factor(x$timepoint)

xff <- as.ffdf(x)
class(xff)
require(ffbase)
xff$time <- with(xff, as.POSIXct(as.character(timepoint)), by = 10000)
ramclass(xff$time)
[1] "POSIXct" "POSIXt" 
str(xff[1:10, ])
'data.frame':   10 obs. of  3 variables:
 $ id       : int  1 2 3 4 5 6 7 8 9 10
 $ timepoint: Factor w/ 100000 levels "2014-10-20 09:14:10",..: 1 2 3 4 5 6 7 8 9 10
 $ time     : POSIXct, format: "2014-10-20 09:14:10" "2014-10-20 09:14:11" "2014-10-20 09:14:12" "2014-10-20 09:14:13" ...
0

Use colClasses when reading in the data. e.g. with your example of two columns: ID (numeric class), and time (factor class):

test <- read.csv.ffdf(file="test.csv", header=T, first.rows=-1,colClasses = c("integer","POSIXct"))
David Arenburg
  • 91,361
  • 17
  • 137
  • 196
HywelMJ
  • 332
  • 2
  • 7