R: How to ggplot two time series dataframes, one of which contains non-numerical labels and slightly different timeStamps

Question

I have 2 data frames:

df1 <- setNames(data.frame(c(as.POSIXct("2022-07-29 00:00:00","2022-07-29 00:05:00","2022-07-29 00:10:00","2022-07-29 00:15:00","2022-07-29 00:20:00")), c(1,2,3,4,5)), c("timeStamp", "value"))
df2 <- setNames(data.frame(c(as.POSIXct("2022-07-29 00:00:05","2022-07-29 00:05:05","2022-07-29 00:20:05")), c("a","b","c")), c("timeStamp", "text"))

I want to plot them, so as to to have the main graph be a numerical y scale geom_point, and then collate in the second dataframe with the labels (a,b,c) at the correct timeStamps on the continuous time series x axis.

ggplot() + 
  geom_point(data=df1, aes(x=timeStamp, y= value)) +
  geom_text(data=df2, aes(x=timeStamp, y= text))

The difficulty I think lies in the fact that the timeStamps do not perfectly match up, and I keep getting returned with "Error: Discrete value supplied to continuous scale". Can anybody please offer some advice here?

The end result should look something like this (this an example from a much larger dataframe) labeled time series using labels from different time series dataframe

Thank you

Do I understand you correctly: You want to label points 1-3 from df1 with labels from df2 text column? — TarJae, Sep 16 '22 at 08:00
Not exactly, I want to have the labels from df2 shown on the graph based on their timeStamps on the x axis — ambergris, Sep 16 '22 at 08:03
Do the rows correlate in df1 and df2. So is row 10 in df1 also row 10 in df2 ? — TarJae, Sep 16 '22 at 08:07
No, df2 has less rows than df1, and they sporadically collate into the whole time series of df1 — ambergris, Sep 16 '22 at 08:10

score 1 · Accepted Answer · answered Sep 16 '22 at 08:34

The issue is not the timeStamp but that for the geom_point you are mapping a numeric or continuous variable on y while for the geom_text you map a discrete one on y. Hence you get the error

Error: Discrete value supplied to continuous scale

To fix that map your text on the label aes (which BTW is required for geom_text) and use the y aes to specify the position where you want to add the labels:

library(ggplot2)

ggplot() + 
  geom_point(data=df1, aes(x=timeStamp, y= value)) +
  geom_text(data=df2, aes(x=timeStamp, label = text, y = 6))

DATA

df1 <- setNames(data.frame(as.POSIXct(c("2022-07-29 00:00:00","2022-07-29 00:05:00","2022-07-29 00:10:00","2022-07-29 00:15:00","2022-07-29 00:20:00")), c(1,2,3,4,5)), c("timeStamp", "value"))
df2 <- setNames(data.frame(as.POSIXct(c("2022-07-29 00:00:05","2022-07-29 00:05:05","2022-07-29 00:20:05")), c("a","b","c")), c("timeStamp", "text"))

fantastic - thank you!! I have one more question: is it also possibly to specify which points get labels? my real data is far larger and the labels are very crowded. is there a way to include in your code here "exclude all 'l,m,n,o,p' variables"? Or even exclude labels before timeStamp x? perhaps this warrants another question but thought id ask in case there is an easy fix — ambergris, Sep 16 '22 at 09:52

TarJae · Answer 2 · 2022-09-16T08:37:54.343

Update: removed 1. answer:

I am still not sure. Also @stefan's answer seems more correct, but maybe you think of something like this:

If you want to position the labels from df2 on top of the points from df1 conditional to the nearest time points between df1 and df2 then we would need to use roll from data.table. This answer was adapted from here Merging two sets of data by data.table roll='nearest' function

library(data.table)
library(tidyverse)

setDT(df1)
setDT(df2)

# Create time column by which to do a rolling join
df1[, time := timeStamp]
df2[, time := timeStamp]
setkey(df1, time)
setkey(df2, time)

set_merged <- df2[df1, roll = "nearest"]
set_merged %>% 
  as_tibble() %>% 
  ggplot(aes(x = time, y=value, group=1)) + 
  geom_point() +
  geom_line()+
  geom_text(aes(x=time, y=max(value)+0.1, label=text))+
  theme_minimal()

Not exactly what I'm looking for. I simple want to note on the regular geom_point graph where important events have happened in a huge series of data. I do not want to have the text on the y axis. It really should only correlate to the timeStamp x axis. I also edited the original post - it is probably important to note that the data frames are of different sizes so there wont be a label for every point of data from df1 — ambergris, Sep 16 '22 at 07:57

R: How to ggplot two time series dataframes, one of which contains non-numerical labels and slightly different timeStamps

2 Answers2