0

I have a csv with + 1 million lines. This is daily weather data, every 6 hours UTC for the same day in yyyy-mm-dd HH:MM format.

I would like to create a separate column with the information for the day date_ymd in date format yyyy-mm-dd and a separate column hour_UTC for time format HH:MM clock.

My goal is to be able to add and subtract time with this hour_UTC column, so I want it to have a time format instead of a character.

I tried to perform the operations using Posixct and lubridate, but without success, returning NAs in the hour_UTC column.

df<-structure(list(regiao = c(7L, 15L, 4L, 3L, 15L, 8L, 9L, 18L, 
15L, 7L, 14L, 13L, 9L, 1L, 6L), date_edit = c("2002-11-02 12:00", 
"2012-10-15 18:00", "2009-10-20 18:00", "1996-10-13 00:00", "1997-09-18 06:00", 
"2010-08-20 00:00", "1993-09-05 12:00", "1999-11-20 12:00", "1990-03-01 00:00", 
"2003-12-23 12:00", "1991-01-23 18:00", "1994-02-16 18:00", "2018-10-18 12:00", 
"2003-03-15 06:00", "2007-12-09 18:00"), mean = c(297.799987792969, 
301.003334424175, 298.758956947398, 299.499620090601, 298.328689034104, 
305.724042300979, 298.741031707088, 295.561164916551, 296.31868576158, 
297.5, 300.400186586472, 300.891081546832, 301.566059015974, 
296.400024414062, 301.456544724043)), class = "data.frame", row.names = c(NA, 
-15L))

df2<-df%>%
  mutate(datetime=as.POSIXct(date_edit, format="%Y-%m-%d %H:%M", tz="UTC"))%>%
   #create column yyymmdd format
  mutate(date_ymd=as.Date(datetime))%>%
   #creat column time formate hh:mm
  mutate(hour_UTC=lubridate::hm(datetime))
    

The lubridate documentation informs that it is not capable of working with large amounts of data, I'm confused: https://lubridate.tidyverse.org/reference/parse_date_time.html

Important note: the time information present in date_time represents the time the temperature was measured, for example 18:00 means that it was collected at 6:00 pm UTC. This is important to say, as "hour" cannot be confused with "elapsed time". Something that "took" 18 hours to complete is different from something that finished at 6:00pm.

wesleysc352
  • 579
  • 1
  • 8
  • 21
  • 1
    Perhaps `mutate(hour_UTC=hms::as.hms(datetime))` – Jon Spring Feb 21 '23 at 01:40
  • @JonSpring this command apparently converts to an unknown data type instead of a time format. – wesleysc352 Feb 21 '23 at 02:05
  • Does this answer your question? [Split date-time column into Date and time variables](https://stackoverflow.com/questions/19292438/split-date-time-column-into-date-and-time-variables) – divibisan Feb 21 '23 at 17:14
  • 1
    I can't find where lubridate says it doesn't work with large amounts of data, but I'd guess that's an issue with the way `parse_date_time` tests multiple data formats. If the date format is consistent, as in your example, you can use much simpler functions to parse it – divibisan Feb 21 '23 at 17:20
  • @divibisan this doesn't completely solve it, because if I apply the suggestion, my output for `hour_UTC` will be a column of type `character` so I won't be able to work with the time. – wesleysc352 Feb 21 '23 at 17:37
  • @wesleysc352 How do you want to work with the time? All the normal date formats require date too. If you want something a pure time object, it looks like the [chron package](https://stackoverflow.com/q/22659947/8366499) might have what you need – divibisan Feb 21 '23 at 18:22
  • @divibisan unfortunately using chron I couldn't. Even converting the characters to time (hms), it was not possible to perform mathematical operations. For example I want to rewind 4 hours from midnight `00:00:00` so that from `20:00:00`, chorn returns a negative integer. – wesleysc352 Feb 21 '23 at 22:05

0 Answers0