I am a relatively new R user and this is my first question on StackOverflow, so apologies if my question is unclear or obviously stated somewhere else.
I have a large dataset (7.8 GB, 137 million observations) that I have loaded into R in a ffdf format as my understanding is that this will allow me to manipulate the data (with the aim of reducing it to a manageable size) without crashing my computer.
My dataset consists of six features, one of which is a timestamp in the format "2012-10-12 00:30:00 BST". As each observation (electricity readings) is taken at exactly every half hour interval, I would like to categorise the data by which of the 48 half hours in the day the observation takes place. As a first step I would therefore like to separate out the date and the time from the timestamp. (The aim after that is to code this time column from 1-48 for each half hour.)
The following code worked to create a new date column:
ff1$date <- as.character(as.Date(ff1$DateTime))
However, I am struggling to do the same for time and have tried a number of methods based on perhaps crude copying from other examples.
(1) ff1$time <- as.POSIXct(strptime(as.character(ff1$DateTime),"%T"))
(2) ff1$time <- strptime(ff1$DateTime,"%Y-%m-%d %H:%M:%S")
(3) ff1$time <- sapply(strptime(as.character(ff1$DateTime)," "), "[", 2)
None of these work. The errors for each of the three lines above are:
(1) Error in strptime(as.character(ff1$DateTime), "%T") : invalid 'x' argument
(2) Error in strptime(ff1$DateTime, "%Y-%m-%d %H:%M:%S") : invalid 'x' argument
(3) Error in strptime(as.character(ff1$DateTime), " ") : invalid 'x' argument
Is this because the data is in fdff format? Are there other ways of doing this?
Many thanks in advance!
Arjun
dput:
structure(list(LCLid = structure(c(1L, 1L, 1L, 1L), .Label = "MAC000002", class = "factor"),
stdorToU = structure(c(1L, 1L, 1L, 1L), .Label = "Std", class = "factor"),
DateTime = structure(c(1349998200, 1.35e+09, 1350001800,
1350003600), tzone = "", class = c("POSIXct", "POSIXt")),
KWH.hh..per.half.hour. = structure(c(1L, 1L, 1L, 1L), .Label = " 0 ", class = "factor"),
Acorn = structure(c(1L, 1L, 1L, 1L), .Label = "ACORN-A", class = "factor"),
Acorn_grouped = structure(c(1L, 1L, 1L, 1L), .Label = "Affluent", class = "factor"),
date = structure(c(1L, 2L, 2L, 2L), .Label = c("2012-10-11",
"2012-10-12"), class = "factor")), row.names = c("1", "2",
"3", "4"), class = "data.frame")
headers of relevant columns:
LCLid DateTime
1 MAC000002 2012-10-12 00:30:00
2 MAC000002 2012-10-12 01:00:00
3 MAC000002 2012-10-12 01:30:00
4 MAC000002 2012-10-12 02:00:00
5 MAC000002 2012-10-12 02:30:00
6 MAC000002 2012-10-12 03:00:00