2

I have a data frame with a Date column and a column of calculated concentrations of a parameter. I am trying to plot a time-series plot that has scatter points of all the concentrations and then have a horizontal line showing the standard of the pollutant(which is 500). I can do this no problem. The problem I am having is trying to plot a line showing the duration of the exceedances > 500. I can't seem to find anything to solve my question. I would appreciate any guidance.

Sample Data:

df<-structure(list(Date_Time = structure(c(1480093200, 1482660000, 
1395651343, 1329823800, 1326929400, 1331233200, 1490130000, 1476138600, 
1474070400, 1489393800, 1483272000, 1393515068, 1480471200, 1332680400, 
1471226400, 1470853800, 1396124591, 1496250000, 1394581991, 1438177553, 
1332108000, 1493051400, 1475949600, 1491024600, 1488832200, 1473697800, 
1475404200, 1488511800, 1490212800, 1477040400, 1494793740, 1389346885, 
1473933600, 1390611191, 1486551600, 1476475200, 1473593400, 1388854543, 
1327012200, 1493611140), class = c("POSIXct", "POSIXt"), tzone = "UTC"), 
    Calculated_TDS = c(271.3692, 634.3604, 634.246, 219.546, 
    674.286, 169.21, 506.118, 452.6932, 314.8412, 4640.3052, 
    358.0844, 734.918, 97.71, 460.358, 385.998, 283.9532, 370.554, 
    309.2356, 296.766, 137.079616, 24.494, 383.996, 321.2476, 
    784.6248, 642.1396, 1320.7032, 213.254, 462.1884, 547.6452, 
    376.274, 195.1216, 595.35, 320.1608, 411.166, 882.5512, 288.5292, 
    533.574, 1000.326, 124.022, 256.6116)), row.names = c(NA, 
-40L), class = c("tbl_df", "tbl", "data.frame"), .Names = c("Date_Time", 
"Calculated_TDS"))

Code:

library(tidyverse)

test_df<-df%>%
    mutate(greater = Calculated_TDS > 500)%>%
    group_by(Date_Time,Calculated_TDS)%>%
    summarize(n_greater = sum(greater), duration = length(Date_Time))


plot<-ggplot() +
  geom_point(data = test_df , aes(x = Date_Time, y = Calculated_TDS))+
  geom_line(data= test_df,aes(x=Date_Time, y = duration),stat="identity")+
  geom_hline(aes(yintercept = 500,color="red"),size=1.3)

plot

I know what I have doesn't make sense but I don't understand how to find the duration of exceedances.

enter image description here

NBE
  • 641
  • 2
  • 11
  • 33
  • FYI, you code didn't run. Can you draw your expected output in Paint or Word? – Tung Nov 19 '18 at 17:49
  • Are you looking for something similar to flow duration curve? https://stackoverflow.com/questions/52831687/flow-duration-curve-fdc-extract-low-threshold – Tung Nov 19 '18 at 17:51
  • @Tung fixed code sorry, should work now. That's not quite what I want. I will try to draw expected output – NBE Nov 19 '18 at 17:53
  • @Tung I included expected output. The lines should be the time duration of each event that went over the standard of 500. – NBE Nov 19 '18 at 18:04
  • Can you pls define for us what an "event" is? Let's say a reading is over 500, as is the case about 35% of the time in your sample. When should the duration start and when should it end? As long as the readings continue to be over 500? In your drawing it looks like there is grouping that includes readings before and after the > 500 readings. – Jon Spring Nov 19 '18 at 18:31
  • @JonSpring so an event would start at 500 and then end when it isn't 500 anymore. So the time duration of how long the exceedance lasts for. Sorry if I wasn't clear enough. – NBE Nov 19 '18 at 18:44

2 Answers2

3

I am not quite sure what you want, but here is a starting point. The idea is simply to make columns for the exceeded and the below, filled with NA, and then plot. Color assigned is red to the the exceeded line and blue to the below points. Note that the color = "red" should be outside the aes for the horizontal line. Only use color inside aes when the color should vary with the value.


library(tidyverse)

test_df <- df %>% 
  mutate(greater = Calculated_TDS > 500, 
         exceed_value = if_else(greater, Calculated_TDS, as.numeric(NA)), 
         below_value = if_else(greater, as.numeric(NA), Calculated_TDS))

plot <- ggplot(data = test_df, aes(x = Date_Time)) + 
  geom_point(aes(y = exceed_value),  color = "red") + 
  geom_point(aes(y = below_value), color = "blue") + 
  geom_line(aes(y = exceed_value), 
  color = "red") + geom_hline(aes(yintercept = 500), color = "red", size = 1.3)

print(plot)
#> Warning in as.POSIXlt.POSIXct(x): unknown timezone 'zone/tz/2018g.1.0/
#> zoneinfo/America/New_York'
#> Warning: Removed 26 rows containing missing values (geom_point).
#> Warning: Removed 14 rows containing missing values (geom_point).
#> Warning: Removed 4 rows containing missing values (geom_path).

Andrew Lavers
  • 4,328
  • 1
  • 12
  • 19
  • Thanks for your answer. Is there a way to get how long each event lasted in a new column? – NBE Nov 19 '18 at 19:01
2

Andrew Lavers' answer is a good one. An alternative approach is to make groups for your geom_line and subset data within that geom.

test_df$group <- paste0("Group_", cumsum(c(1, diff(test_df$n_greater) != 0)))
test_df$duration <- ifelse(test_df$n_greater == 1, diff(test_df$Date_Time), 0)

# A tibble: 40 x 5
# Groups:   Date_Time [?]
   Date_Time           Calculated_TDS n_greater duration group  
   <dttm>                       <dbl>     <int>    <dbl> <chr>  
 1 2012-01-18 23:30:00          674.          1      23  Group_1
 2 2012-01-19 22:30:00          124.          0       0  Group_2
 3 2012-02-21 11:30:00          220.          0       0  Group_2
 4 2012-03-08 19:00:00          169.          0       0  Group_2
 5 2012-03-18 22:00:00           24.5         0       0  Group_2
 6 2012-03-25 13:00:00          460.          0       0  Group_2
 7 2014-01-04 16:55:43         1000.          1     137. Group_3
 8 2014-01-10 09:41:25          595.          1     351. Group_3
 9 2014-01-25 00:53:11          411.          0       0  Group_4
10 2014-02-27 15:31:08          735.          1     296. Group_5
# ... with 30 more rows

ggplot() +
  geom_point(data = test_df , aes(x = Date_Time, y = Calculated_TDS))+
  geom_line(data = subset(test_df, Calculated_TDS > 500), aes(x=Date_Time, y = Calculated_TDS, group = group), stat="identity")+
  geom_hline(aes(yintercept = 500,color="red"),size=1.3)

enter image description here

Anonymous coward
  • 2,061
  • 1
  • 16
  • 29
  • thanks for your answer. Is there a way to make a new column saying how long each exceedance lasted? – NBE Nov 19 '18 at 19:06