I have data frames consisting of two columns: light intensity (par), and time, with each row representing a 'snapshot' in time:
The data is effectively a time series. The image attached consists of 13 of the data frames: plot(df$time, df$par)
I have been trying to write a function that will return the data points representing dusk (first light of the day) and dawn (last light of the day).
On the graph, this would look like the first and last points on each of the "flat" areas of the graph, where the par (should) be equal to 0.
I've been able to identify most of them with an atrocious loop, that basically looks for streaks of min(par), and if it finds of >100 it saves the i-100th file:
dawnDusk <- function (df) {
min.par <- min(df$par)
day.streak <- 0
night.streak <- 0
dawn <- c()
dusk <- c()
for (i in 1:nrow(df)) {
if (df$par[i] <= min.par + 0.1 | is.na(df$par[i])) {
night.streak <- night.streak + 1
day.streak <- 0
} else if (df$par[i] > min.par + 0.1 | is.na(df$par[i])) {
day.streak <-day.streak + 1
night.streak <- 0
}
# the 100 comes from the shortest night in the year (~5hours)
# divided into 3 minute intervals
if (night.streak == 100) {
dusk <- c(dusk, i - 100)
} else if (day.streak == 100 & i != 100) {
dawn <- c(dawn, i - 100)
}
}
I've also tried differencing it, but it came out funky...
The main issues that arise are that the instrument that measures par can be inaccurate (woohoo!), so the min can sometimes vary by +/- 0.1 from its baseline. It can also read 0 sometimes when it shouldn't (seen in the 7th graph)
One thing that might be helpful is once you can find 1 correct dawn and 1 correct dusk file, you know that every other file should be +/- 24 hours in time from it.
This is my first time posting on stackoverflow, I have spent hours looking at every post related to this topic to no avail. I'm hoping someone can help give me a push in the right direction. Thanks in advance!