0

I am a relatively new R user and this is my first question on StackOverflow, so apologies if my question is unclear or obviously stated somewhere else.

I have a large dataset (7.8 GB, 137 million observations) that I have loaded into R in a ffdf format as my understanding is that this will allow me to manipulate the data (with the aim of reducing it to a manageable size) without crashing my computer.

My dataset consists of six features, one of which is a timestamp in the format "2012-10-12 00:30:00 BST". As each observation (electricity readings) is taken at exactly every half hour interval, I would like to categorise the data by which of the 48 half hours in the day the observation takes place. As a first step I would therefore like to separate out the date and the time from the timestamp. (The aim after that is to code this time column from 1-48 for each half hour.)

The following code worked to create a new date column:

ff1$date <- as.character(as.Date(ff1$DateTime))

However, I am struggling to do the same for time and have tried a number of methods based on perhaps crude copying from other examples.

(1) ff1$time <- as.POSIXct(strptime(as.character(ff1$DateTime),"%T"))

(2) ff1$time <- strptime(ff1$DateTime,"%Y-%m-%d %H:%M:%S")

(3) ff1$time <- sapply(strptime(as.character(ff1$DateTime)," "), "[", 2)

None of these work. The errors for each of the three lines above are:

(1) Error in strptime(as.character(ff1$DateTime), "%T") : invalid 'x' argument

(2) Error in strptime(ff1$DateTime, "%Y-%m-%d %H:%M:%S") : invalid 'x' argument

(3) Error in strptime(as.character(ff1$DateTime), " ") : invalid 'x' argument

Is this because the data is in fdff format? Are there other ways of doing this?

Many thanks in advance!

Arjun

dput:

structure(list(LCLid = structure(c(1L, 1L, 1L, 1L), .Label = "MAC000002", class = "factor"), 
    stdorToU = structure(c(1L, 1L, 1L, 1L), .Label = "Std", class = "factor"), 
    DateTime = structure(c(1349998200, 1.35e+09, 1350001800, 
    1350003600), tzone = "", class = c("POSIXct", "POSIXt")), 
    KWH.hh..per.half.hour. = structure(c(1L, 1L, 1L, 1L), .Label = " 0 ", class = "factor"), 
    Acorn = structure(c(1L, 1L, 1L, 1L), .Label = "ACORN-A", class = "factor"), 
    Acorn_grouped = structure(c(1L, 1L, 1L, 1L), .Label = "Affluent", class = "factor"), 
    date = structure(c(1L, 2L, 2L, 2L), .Label = c("2012-10-11", 
    "2012-10-12"), class = "factor")), row.names = c("1", "2", 
"3", "4"), class = "data.frame")

headers of relevant columns:

      LCLid            DateTime
1 MAC000002 2012-10-12 00:30:00
2 MAC000002 2012-10-12 01:00:00
3 MAC000002 2012-10-12 01:30:00
4 MAC000002 2012-10-12 02:00:00
5 MAC000002 2012-10-12 02:30:00
6 MAC000002 2012-10-12 03:00:00
jay.sf
  • 60,139
  • 8
  • 53
  • 110
arj
  • 25
  • 6

3 Answers3

1

The code you are trying is giving errors probably because the column "DateTime is not of class "POSIXt", "POSIXct". So first coerce to a date/time class, then extract the time only.

ff1$DateTime <- as.POSIXct(ff1$DateTime)
format(ff1$DateTime, format = "%T")
#[1] "00:30:00"

Edit.

If the above gives an error try

ff1$DateTime <- as.POSIXct(as.character(ff1$DateTime))
format(ff1$DateTime, format = "%T")

Data.

ff1 <- data.frame(DateTime = "2012-10-12 00:30:00 BST")
Rui Barradas
  • 70,273
  • 8
  • 34
  • 66
  • thanks Rui! i get the error ```Error in as.POSIXct.default(ff1$DateTime) : do not know how to convert 'ff1$DateTime' to class “POSIXct”``` – arj Jan 16 '20 at 13:24
  • 1
    `format(as.POSIXct(as.character(ff1$DateTime)), format="%T")`, probably. – jay.sf Jan 16 '20 at 13:47
  • i think this has solved it! ```ff1$Time <- format(ff1$DateTime, format = "%T")``` – arj Jan 16 '20 at 18:23
1

If you use dates and times a lot, lubridate may become helpful. Here I use ymd_hms() to convert the year-month-day hour-minute-second format into an actual datetime. Then use format.

This is not materially different than the other solutions, just a different way of converting back to a datetime.

Code:

library(lubridate)

ff1$time <- format(ymd_hms(ff1$DateTime), format = "%H:%M:%S")

Result:

> ff1
      LCLid stdorToU            DateTime KWH.hh..per.half.hour.   Acorn Acorn_grouped       date     time
1 MAC000002      Std 2012-10-11 19:30:00                     0  ACORN-A      Affluent 2012-10-11 19:30:00
2 MAC000002      Std 2012-10-11 20:00:00                     0  ACORN-A      Affluent 2012-10-12 20:00:00
3 MAC000002      Std 2012-10-11 20:30:00                     0  ACORN-A      Affluent 2012-10-12 20:30:00
4 MAC000002      Std 2012-10-11 21:00:00                     0  ACORN-A      Affluent 2012-10-12 21:00:00
  • thanks adam. the error i get here is ```Error in `[[<-.ffdf`(`*tmp*`, i, value = character(0)) : assigned value must be ff``` – arj Jan 16 '20 at 16:23
0

You could use strsplit.

sapply(strsplit(as.character(dat$x), " "), `[`, 1)
# [1] "2012-10-12" "2012-10-12" "2012-10-12" "2012-10-12" "2012-10-12"
sapply(strsplit(as.character(dat$x), " "), `[`, 2)
# [1] "00:30:00" "00:30:00" "00:30:00" "00:30:00" "00:30:00"

Data:

x <- "2012-10-12 00:30:00 BST"
dat <- data.frame(x=replicate(5, x))
jay.sf
  • 60,139
  • 8
  • 53
  • 110
  • thanks jay for the quick reply! i tried the strsplit as suggested and got the error "Error in strsplit(ff1$DateTime, " ") : non-character argument" – arj Jan 16 '20 at 13:02
  • I then also tried to create a dataframe: ```t1 <- data.frame(x=replicate(5, ff1$DateTime), stringsAsFactors=F)``` my first question is that this would just take a subset, right? i guess i had in mind editing the whole dataset so i can categorise the observations by time. – arj Jan 16 '20 at 13:03
  • but also i got an error ```Error in as.data.frame.default(x[[i]], optional = TRUE, stringsAsFactors = stringsAsFactors) : cannot coerce class ‘c("ff_vector", "ff")’ to a data.frame``` – arj Jan 16 '20 at 13:04
  • @arj Wrapping an `as.character(.)` around the data should work, see update. – jay.sf Jan 16 '20 at 13:07
  • @arj Better always provide data with `dput`. Please read: https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610#5963610 – jay.sf Jan 16 '20 at 13:10
  • added a dput, jay. when i add as.character, the error i get is ```Error in BATCHBYTES%/%RECORDBYTES : non-numeric argument to binary operator``` – arj Jan 16 '20 at 13:25
  • @arj Nicely done! `sapply(strsplit(as.character(dat$DateTime), " "), "[", 1)` works for me? – jay.sf Jan 16 '20 at 13:40
  • if i enter: ```date <- sapply(strsplit(as.character(ff1$DateTime), " "), `[`, 1)``` i get : ```Error in strsplit(as.character(ff1$DateTime), " ") : non-character argument``` – arj Jan 16 '20 at 13:48
  • @arj Are you sure? It works for me with your `dput`, also the other solution works with `as.character`. Perhaps start a fresh R session. – jay.sf Jan 16 '20 at 13:51
  • sorry for the delay - restarting was problematic, but have done it - to no avail unfortunately. have copied your text directly : ```ff1$time <- sapply(strsplit(as.character(ff1$DateTime), " "), "[", 2)``` the non-character error remains – arj Jan 16 '20 at 18:17
  • @arj but on the `dput` it works, where's the difference? – jay.sf Jan 16 '20 at 18:41