1

I have this df where I have columns with date&time, date, and time. And of course the observations of CH4 and the calculated Ratio (I have more, but that is unrelevant to this question).

'data.frame':   1420847 obs. of  17 variables
$ Start     : Factor w/ 1469 levels "2013-08-31 23:56:09.000",..: 2 2 2 2 2 2 2 2 2 2 ...  
$ CO2       : int  1510 1950 1190 1170 780 870 730 740 680 700 ...
$ CH4       : int  66 77 62 58 34 51 36 43 32 40 ...
$ Ratio     : num  0.0437 0.0395 0.0521 0.0496 0.0436 ...  
$ Start_time: POSIXlt, format: "2013-11-20 00:10:05" "2013-11-20 00:10:05" "2013-11-20 00:10:05" "2013-11-20 00:10:05" ...  
$ Start_date: Date, format: "2013-09-01" "2013-09-01" "2013-09-01" "2013-09-01" ...

Now I wish to split every day in six blocks of 4 hrs and to assign numbers 1 - 6 to each block. The problem, however, is that I only have the date and time at which the measurements started (Start_date and Start_time, or the combined Start), so I think it is necessary to assign each new Start_time to a block. The length of the observations varies a lot, so there is no option of assigning a number to it. This is what I wish to accomplish:

                  Start  Start_time    Start_date   CO2 CH4       Ratio  block
2013-09-01 00:10:05.000    00:10:05    2013-09-01  1510  66  0.04370861      1
2013-09-01 00:10:05.000    00:10:05    2013-09-01  1950  77  0.03948718      1
2013-09-01 05:16:55.000    05:16:55    2013-09-01  1190  62  0.05210084      2
2013-09-01 05:16:55.000    05:16:55    2013-09-01  1170  58  0.04957265      2
2013-09-01 05:16:55.000    05:16:55    2013-09-01   780  34  0.04358974      2
2013-09-01 12:44:33.000    12:44:33    2013-09-01   870  51  0.05862069      4
2013-09-01 12:44:33.000    12:44:33    2013-09-01   730  36  0.04931507      4
2013-09-01 22:14:23.000    22:14:23    2013-09-01   740  43  0.05810811      6
2013-09-01 22:14:23.000    22:14:23    2013-09-01   680  32  0.04705882      6
2013-09-02 08:37:05.000    08:37:05    2013-09-02   700  40  0.05714286      3
2013-09-02 08:37:05.000    08:37:05    2013-09-02   610  35  0.05737705      3
2013-09-02 17:22:33.000    17:22:33    2013-09-02   630  25  0.03968254      5
2013-09-02 17:22:33.000    17:22:33    2013-09-02   670  40  0.05970149      5
2013-09-02 23:59:44.000    23:59:44    2013-09-02   640  37  0.05781250      6
2013-09-02 23:59:44.000    23:59:44    2013-09-02   730  35  0.04794521      6

I have searched this website and also tried Google but, so far, I have found no answer. I have tried the following code, which I found in an answer on this website but no luck.

qaa <- split(df, cut(strptime(paste(df$Start_date, df$Start_time), format = "%Y-%m-%d %H:%M"),"4 hours"))

Previously, I tried to split the number of observations in minutes, so I tried to adjust that code. And to be very honest, I have no idea what I am doing (as you can probably tell).

lst<- split(df, df$Start_date)
nobs <- "4 hours" 
List <- unlist(lapply(lst, function(x) {
  x$grp <- rep(1:(nrow(x)/nobs+1), each = nobs)[1:nrow(x)] 
  split(x, x$grp)}), recursive = FALSE)
b <- as.matrix(do.call("rbind", List))

Just to let you know, again, I am a NOOB concerning R so it takes me a lot of time to figure everything out. I understand very little of the language but I am trying my very best to make it work. I really enjoy working with it! If there is already another question like this on this website, please let me know so I can remove this.. I have not found it, though.

Thank you for taking your time to read my question and to consider to answer it!

Jalalala
  • 727
  • 2
  • 8
  • 15

2 Answers2

1

If you can extract the start hour from the start time (try here: Dealing with timestamps in R), you could then use the following to assign the correct block number :

df$block[df$start_hour>=0 & df$start_hour<4]<-1
df$block[df$start_hour>=4 & df$start_hour<8]<-2
df$block[df$start_hour>=8 & df$start_hour<12]<-3
df$block[df$start_hour>=12 & df$start_hour<16]<-4
df$block[df$start_hour>=16 & df$start_hour<20]<-5
df$block[df$start_hour>=20 & df$start_hour<24]<-6
Community
  • 1
  • 1
user2568648
  • 3,001
  • 8
  • 35
  • 52
  • Yesss! That did it! I am sorry for my late response. At first, I could not see how the page you gave me was applicable to my df.. But I figured it out. Thank you very much! – Jalalala Nov 21 '13 at 11:00
0

If you install lubridate in particular you will be helped as it has useful functions like hour. cut2 from Hmisc allows you specify some easy brackets for your hours to be split by.

library("lubridate")
library("Hmisc")
example<-as.factor('2013-09-01 00:10:05.000')
example<-data.frame(example,timeslot=cut2(hour(as.POSIXct(example,"%Y-%m-%d %H:%M")),cuts=seq(0,24,4)))
Steph Locke
  • 5,951
  • 4
  • 39
  • 77