1

I have created the following plot based on air quality data over three years of observation, and would like to know if these slopes are different across the two time periods (March-June 2018-2019 average vs. March-June 2020): enter image description here

A snapshot of my data frame is shown here: enter image description here

The figure is made using the following code:

Lockdown_Period_plot_weekday <- ggplot(COVID_NO2_weekday_avgs_Rathmines, aes(x = Date_1, y = avg_daily_Rath_NO2, color = Period, shape = Period)) +
   geom_smooth(method="lm", se = FALSE) +
   geom_point(size=2) +
   theme_bw() +
   labs(x = 'Date',
        y = 'Daily Avg [NO2] µg/m^3',
        title = 'Weekday NO2 Trends During Lockdown',
        subtitle = 'Rathmines AQ Station')

I know that I need to remove the effect of serial correlation first (as the independent variable is a time series), but I'm not exactly sure how to do this. Should I use the date column to do so? Or should I use the dummy column Date_2 to do this? This column is just a concatenation of Month.Date to create a series of x values that are numerical and continuous.

I used the gls() function to do this, and believe I have designated the date column as my serial correlation.

My attempt is displayed here:

library(nlme)
m <- gls(avg_daily_Rath_NO2 ~ Period,
         data=COVID_NO2_weekday_avgs_Rathmines,
         correlation=corARMA(p=1, q=0, form=~date))
summary(m)

Output:

Generalized least squares fit by REML
  Model: avg_daily_Rath_NO2 ~ Period 
  Data: COVID_NO2_weekday_avgs_Rathmines 

Correlation Structure: ARMA(1,0)
 Formula: ~date 
 Parameter estimate(s):
     Phi1 
0.6066636 

Coefficients:

 Correlation: 
                      (Intr)
PeriodMarch-June 2020 -0.569

Standardized residuals:
       Min         Q1        Med         Q3 
-1.8573362 -0.6487672 -0.1588551  0.5597100 
       Max 
 3.4017470 

Residual standard error: 10.46725 
Degrees of freedom: 256 total; 254 residual 

enter image description here enter image description here

I am a tad rusty when it comes to linear regression outputs, and am not sure how to interpret this one.

Additionally, I would like to check that my model is correctly structured to achieve my desired output.

Any help with this would be appreciated.

-TL;DR-

  1. I want to run a ANCOVA on two lines to find out if the slopes differ across the Period variable.
  2. I would like to remove the effect of serial correlation since the independent variable is a time series.

What is the most effective way to accomplish this?

More information can be provided if necessary.

Hp88
  • 23
  • 2

0 Answers0