I have a csv with + 1 million lines. This is daily weather data, every 6 hours UTC for the same day in yyyy-mm-dd HH:MM format.
I would like to create a separate column with the information for the day date_ymd
in date format yyyy-mm-dd and a separate column hour_UTC
for time format HH:MM clock.
My goal is to be able to add and subtract time with this hour_UTC
column, so I want it to have a time format instead of a character.
I tried to perform the operations using Posixct and lubridate, but without success, returning NAs in the hour_UTC
column.
df<-structure(list(regiao = c(7L, 15L, 4L, 3L, 15L, 8L, 9L, 18L,
15L, 7L, 14L, 13L, 9L, 1L, 6L), date_edit = c("2002-11-02 12:00",
"2012-10-15 18:00", "2009-10-20 18:00", "1996-10-13 00:00", "1997-09-18 06:00",
"2010-08-20 00:00", "1993-09-05 12:00", "1999-11-20 12:00", "1990-03-01 00:00",
"2003-12-23 12:00", "1991-01-23 18:00", "1994-02-16 18:00", "2018-10-18 12:00",
"2003-03-15 06:00", "2007-12-09 18:00"), mean = c(297.799987792969,
301.003334424175, 298.758956947398, 299.499620090601, 298.328689034104,
305.724042300979, 298.741031707088, 295.561164916551, 296.31868576158,
297.5, 300.400186586472, 300.891081546832, 301.566059015974,
296.400024414062, 301.456544724043)), class = "data.frame", row.names = c(NA,
-15L))
df2<-df%>%
mutate(datetime=as.POSIXct(date_edit, format="%Y-%m-%d %H:%M", tz="UTC"))%>%
#create column yyymmdd format
mutate(date_ymd=as.Date(datetime))%>%
#creat column time formate hh:mm
mutate(hour_UTC=lubridate::hm(datetime))
The lubridate documentation informs that it is not capable of working with large amounts of data, I'm confused: https://lubridate.tidyverse.org/reference/parse_date_time.html
Important note:
the time information present in date_time
represents the time the temperature was measured, for example 18:00
means that it was collected at 6:00 pm UTC. This is important to say, as "hour" cannot be confused with "elapsed time". Something that "took" 18 hours to complete is different from something that finished at 6:00pm.