0

I have 2 data frames:

df1 <- setNames(data.frame(c(as.POSIXct("2022-07-29 00:00:00","2022-07-29 00:05:00","2022-07-29 00:10:00","2022-07-29 00:15:00","2022-07-29 00:20:00")), c(1,2,3,4,5)), c("timeStamp", "value"))
df2 <- setNames(data.frame(c(as.POSIXct("2022-07-29 00:00:05","2022-07-29 00:05:05","2022-07-29 00:20:05")), c("a","b","c")), c("timeStamp", "text"))

I want to plot them, so as to to have the main graph be a numerical y scale geom_point, and then collate in the second dataframe with the labels (a,b,c) at the correct timeStamps on the continuous time series x axis.

ggplot() + 
  geom_point(data=df1, aes(x=timeStamp, y= value)) +
  geom_text(data=df2, aes(x=timeStamp, y= text))

The difficulty I think lies in the fact that the timeStamps do not perfectly match up, and I keep getting returned with "Error: Discrete value supplied to continuous scale". Can anybody please offer some advice here?

The end result should look something like this (this an example from a much larger dataframe) labeled time series using labels from different time series dataframe

Thank you

ambergris
  • 17
  • 4

2 Answers2

1

The issue is not the timeStamp but that for the geom_point you are mapping a numeric or continuous variable on y while for the geom_text you map a discrete one on y. Hence you get the error

Error: Discrete value supplied to continuous scale

To fix that map your text on the label aes (which BTW is required for geom_text) and use the y aes to specify the position where you want to add the labels:

library(ggplot2)

ggplot() + 
  geom_point(data=df1, aes(x=timeStamp, y= value)) +
  geom_text(data=df2, aes(x=timeStamp, label = text, y = 6))

DATA

df1 <- setNames(data.frame(as.POSIXct(c("2022-07-29 00:00:00","2022-07-29 00:05:00","2022-07-29 00:10:00","2022-07-29 00:15:00","2022-07-29 00:20:00")), c(1,2,3,4,5)), c("timeStamp", "value"))
df2 <- setNames(data.frame(as.POSIXct(c("2022-07-29 00:00:05","2022-07-29 00:05:05","2022-07-29 00:20:05")), c("a","b","c")), c("timeStamp", "text"))

stefan
  • 90,330
  • 6
  • 25
  • 51
  • fantastic - thank you!! I have one more question: is it also possibly to specify which points get labels? my real data is far larger and the labels are very crowded. is there a way to include in your code here "exclude all 'l,m,n,o,p' variables"? Or even exclude labels before timeStamp x? perhaps this warrants another question but thought id ask in case there is an easy fix – ambergris Sep 16 '22 at 09:52
0

Update: removed 1. answer:

I am still not sure. Also @stefan's answer seems more correct, but maybe you think of something like this:

If you want to position the labels from df2 on top of the points from df1 conditional to the nearest time points between df1 and df2 then we would need to use roll from data.table. This answer was adapted from here Merging two sets of data by data.table roll='nearest' function

library(data.table)
library(tidyverse)

setDT(df1)
setDT(df2)

# Create time column by which to do a rolling join
df1[, time := timeStamp]
df2[, time := timeStamp]
setkey(df1, time)
setkey(df2, time)

set_merged <- df2[df1, roll = "nearest"]
set_merged %>% 
  as_tibble() %>% 
  ggplot(aes(x = time, y=value, group=1)) + 
  geom_point() +
  geom_line()+
  geom_text(aes(x=time, y=max(value)+0.1, label=text))+
  theme_minimal()

enter image description here

TarJae
  • 72,363
  • 6
  • 19
  • 66
  • Not exactly what I'm looking for. I simple want to note on the regular geom_point graph where important events have happened in a huge series of data. I do not want to have the text on the y axis. It really should only correlate to the timeStamp x axis. I also edited the original post - it is probably important to note that the data frames are of different sizes so there wont be a label for every point of data from df1 – ambergris Sep 16 '22 at 07:57