I have created the following plot based on air quality data over three years of observation, and would like to know if these slopes are different across the two time periods (March-June 2018-2019 average vs. March-June 2020):
A snapshot of my data frame is shown here:
The figure is made using the following code:
Lockdown_Period_plot_weekday <- ggplot(COVID_NO2_weekday_avgs_Rathmines, aes(x = Date_1, y = avg_daily_Rath_NO2, color = Period, shape = Period)) +
geom_smooth(method="lm", se = FALSE) +
geom_point(size=2) +
theme_bw() +
labs(x = 'Date',
y = 'Daily Avg [NO2] µg/m^3',
title = 'Weekday NO2 Trends During Lockdown',
subtitle = 'Rathmines AQ Station')
I know that I need to remove the effect of serial correlation first (as the independent variable is a time series), but I'm not exactly sure how to do this. Should I use the date
column to do so? Or should I use the dummy column Date_2
to do this? This column is just a concatenation of Month.Date
to create a series of x values that are numerical and continuous.
I used the gls()
function to do this, and believe I have designated the date
column as my serial correlation.
My attempt is displayed here:
library(nlme)
m <- gls(avg_daily_Rath_NO2 ~ Period,
data=COVID_NO2_weekday_avgs_Rathmines,
correlation=corARMA(p=1, q=0, form=~date))
summary(m)
Output:
Generalized least squares fit by REML
Model: avg_daily_Rath_NO2 ~ Period
Data: COVID_NO2_weekday_avgs_Rathmines
Correlation Structure: ARMA(1,0)
Formula: ~date
Parameter estimate(s):
Phi1
0.6066636
Coefficients:
Correlation:
(Intr)
PeriodMarch-June 2020 -0.569
Standardized residuals:
Min Q1 Med Q3
-1.8573362 -0.6487672 -0.1588551 0.5597100
Max
3.4017470
Residual standard error: 10.46725
Degrees of freedom: 256 total; 254 residual
I am a tad rusty when it comes to linear regression outputs, and am not sure how to interpret this one.
Additionally, I would like to check that my model is correctly structured to achieve my desired output.
Any help with this would be appreciated.
-TL;DR-
- I want to run a ANCOVA on two lines to find out if the slopes differ across the
Period
variable. - I would like to remove the effect of serial correlation since the independent variable is a time series.
What is the most effective way to accomplish this?
More information can be provided if necessary.