-1

I have data three quarters from July-September by Date Hour i.e. 07/01/2013 0:00 , 07/01/2013 1:00. I should have maximum possible 92 * 24 = 2208 Observations. For some reason I have above 2208 observations in some of my data frames.

Here is the the dput of my dataframe

dput(head)
structure(list(DATEHOUR = c("07-01-13 0:00", "07-01-13 10:00", 
"07-01-13 11:00", "07-01-13 12:00", "07-01-13 13:00", "07-01-13 14:00"
), ImpressionsA.x = c(156, 564, 884, 1365, 1864, 1470), ImpressionsM.x = c(83, 
274, 338, 664, 807, 757), ImpressionsA.y = c(0.4, 0, 0.4, 0, 
0, 0), ImpressionsM.y = c(0.2, 0, 0.3, 0, 0, 0), Branded = c(0, 
0, 0, 0, 0, 0), ESI = c(0, 0, 0, 0, 0, 0), ImpressionsA.T = c(156.4, 
564, 884.4, 1365, 1864, 1470), ImpressionsM.T = c(83.2, 274, 
338.3, 664, 807, 757), Leads.T = c(0, 0, 0, 0, 0, 0)), .Names = c("DATEHOUR", 
"ImpressionsA.x", "ImpressionsM.x", "ImpressionsA.y", "ImpressionsM.y", 
"Branded", "ESI", "ImpressionsA.T", "ImpressionsM.T", "Leads.T"
), row.names = c(1L, 3L, 4L, 5L, 6L, 7L), class = "data.frame")

I read the following posts and links http://astrostatistics.psu.edu/su07/R/html/base/html/strptime.html, format a Date column in a Data Frame, Convert data frame with date column to timeseries and tried to do this: test$timestamp<-as.Date(as.character(test$DATEHOUR), format="%m%d%Y%I%M") and combinations but its not working out. My goal is to have 2208 (or whatever the unduplicated observations are) by time series. I am new to R and coding as such so please excuse my rookie syntax understanding.

Community
  • 1
  • 1
vagabond
  • 3,526
  • 5
  • 43
  • 76
  • 1
    So you read `?strptime`, but decided to use `as.Date`? – Joshua Ulrich Jul 22 '14 at 17:53
  • @JoshuaUlrich like i said, I've not worked before with time series data on R and I got confused with the multiple ways R can handle date-hours. this exercise has helped me learn about `?POSIXct`, `?Lubridate` and `?strptime`. I will edit my question with more detail on problems I faced because of the specific nature of my data, specially date-formats from CSV files originally made in EXCEL! – vagabond Jul 23 '14 at 16:14

2 Answers2

2

Try this:

> as.POSIXct(dd$DATEHOUR, format="%m-%d-%y %H:%M")
[1] "2013-07-01 00:00:00 PDT" "2013-07-01 10:00:00 PDT" "2013-07-01 11:00:00 PDT" "2013-07-01 12:00:00 PDT"
[5] "2013-07-01 13:00:00 PDT" "2013-07-01 14:00:00 PDT"
Señor O
  • 17,049
  • 2
  • 45
  • 47
2

If you are too lazy to manually write the formatting, you could try lubridate package

library(lubridate)
mdy_hm(df$DATEHOUR)

## [1] "2013-01-07 00:00:00 UTC" "2013-01-07 10:00:00 UTC" "2013-01-07 11:00:00 UTC"
## [4] "2013-01-07 12:00:00 UTC" "2013-01-07 13:00:00 UTC" "2013-01-07 14:00:00 UTC"
David Arenburg
  • 91,361
  • 17
  • 137
  • 196
  • Well, I've run into a strange problem. I run this: `require(lubridate) df$DATEHOUR2<-dmy_hm(df$DATEHOUR)` which does the trick except some of my dates are going completely haywire. for instance: 07-17-13 02:00 is becoming 2047-07-21 13:24:48. Another example: 07-16-13 11:00 is turning into 2013-01-07 11:00:00 . My CSV file is clean - No spaces. My Date Hour variable is of class: character. scratching my head! – vagabond Jul 23 '14 at 14:41
  • 1
    See edit, I thought that days were first, while in reality the months were. – David Arenburg Jul 23 '14 at 14:43