0

I have a database called data_clean. Participants completed multiple surveys across 10 days. One of the variables (distress) is a categorical variable that includes hours and minutes (e.g. 1:00, 23:15, 2:45). This variable is defined in a categorical format. I created a spaghetti plot, but in the y-axis the hours are not shown in chronological order but in an abc order. I assume this is because the variable is defined in a categorical format. This is the code I used:

slopes_np<-ggplot(data = data_clean, aes(x=survey,y=distress, group=id, color=id))+
  facet_wrap(~PP_rec)+
  geom_smooth(method='lm', se=FALSE, size=0.5)+ 
  xlab("Survey")+ylab("Distress")+
  theme(legend.position = "none")
slopes_np

#a smoother for the overall group
smooth_nopoints <- slopes_np + stat_smooth(aes(group = 1), method = "lm", se = FALSE, colour="black", size=1) 
smooth_nopoints

This is the plot I got: Plot with wrongly ordered y-axis

I tried to change into as.POSIXct but then the plot includes irrelevant dates.

data_clean$distress_time <- as.POSIXct(data_clean$distress, format="%H:%M")
class(data_clean$distress_time)

theme_set(theme_bw(base_size = 7, base_family = ""))

slopes_np<-ggplot(data = data_clean, aes(x=survey,y=distress_time, group=id, color=id))+
  facet_wrap(~PP_rec)+
  geom_smooth(method='lm', se=FALSE, size=0.5)+
  xlab("Survey")+ylab("Distress")+
  theme(legend.position = "none")
slopes_np

#a smoother for the overall group
smooth_nopoints <- slopes_np + stat_smooth(aes(group = 1), method = "lm", se = FALSE, colour="black", size=1) 
smooth_nopoints

And I got this plot: Plots with irrelevant dates Which is not even close to what I need.

I only want to show the hours and minutes in the y-axis, but in a reasonable order. For instance 0:00, 1:00, 2:00, 3:00….19:00, 20:00, 21:00, 23:00). I would appreciate your help to create this plot.

This is an example of two participants' relevant data (the original database is very large):

structure(list(id = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L), levels = c("400", "401", "402", "403", "404", "405", "406", "407", "408", "409", "411", "412", "413", "414", "415", "550", "551", "552", "553", "554", "555", "558", "559", "560", "561", "563", "565", "566", 
"567", "568", "569", "570", "571", "572", "573", "574", "575", 
"576", "577", "578", "580", "590", "591"), class = "factor"), 
distress = c("23:45", "23:30", "23:30", "23:45", "23:30", 
    "22:30", "23:30", "21:30", "23:45", "22:45", "0:00", "1:00", 
    "6:15", "0:00", "1:00", "0:00", "2:00", "2:00", "2:00", "1:45"
    ), survey = c(1, 4, 6, 9, 11, 1, 3, 5, 8, 11, 1, 4, 7, 9, 
    12, 1, 4, 7, 10, 12)))

#The PP_rec indicates whether it was before or after an event.
o.y_80
  • 1
  • 2

1 Answers1

0

As given in this answer, you can convert the time with hms package.

Using your sample data:

# import hms library
# library(hms)

data_clean$distress_time <- as_hms(paste0(data_clean$distress, ':00'))

Then the time at y-axis is automatically sorted (earliest at bottom), one drawback is that you also see the seconds.

  • Thank you! It seems it works (the seconds are not necessary but I think I can keep it). However, I'd like to see all the hours in the y-axis, and it's only showing 00:00:00, 10:00:00. and 20:00:00. How can I modify the y-axis to show all the hours with 1-hour break? – o.y_80 Mar 12 '23 at 15:31
  • @o.y_80 You can try `scale_y_time`. See [documentation](https://ggplot2.tidyverse.org/reference/scale_date.html). – jimmymcheung Mar 12 '23 at 23:30
  • I now get a y-axis with a scale from 0:00:00 to 23:00:00 and the smooth line is in the middle in the afternoon hours which is not logical. I'd like the y-axis to only include times from 19:00 to 07:00 and start at 19:00. I think I'll open a new question about it. – o.y_80 Mar 13 '23 at 06:35
  • @o.y_80 you can set `ylim` to limit the range, if you want late hours at bottom and early hours at top, you need to reorder the y in descending order. – jimmymcheung Mar 13 '23 at 11:49