0

I am new to R. I want to ask a question below:

Here is the data:

year <- c(2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015)
score1 <- c(35,  40,  45, 59, 62, 71, 78, 65, 78, 82, 89, 85, 78, 88, 96, 98)
score2 <- c(56,  45,  45, 59, 65, 73, 79, 69, 79, 88, 89, 88, 72, 85, 98, 92)
score <- array(c(score1, score2))

I have tried segmented package, but I found that it always calculate the breakpoints in the system.

I want to set 2008 as the breakpoint to do the piecewise linear fitting. I also want to know the intercept, slope and p value. How can I do it in R Studio? Thanks.

Skyer
  • 9
  • 2

1 Answers1

0

Use the following linear model with the data in the Note at the end. The slope in stage 2 is the sum of the slopes. Follow the same form as stage2 for further stages/groups.

stage1 <- year - 2000
stage2 <- (year - 2008) * (year >= 2008)
fm <- lm(score ~ stage1 + stage2)
summary(fm)
## Call:
## lm(formula = score ~ stage1 + stage2)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -10.2845  -2.9363  -0.7105   3.9286   8.4312 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  37.8653     3.4952  10.834 7.05e-08 ***
## stage1        5.2839     0.6578   8.032 2.14e-06 ***
## stage2       -3.2470     1.2715  -2.554    0.024 *  
## ---
## Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
## 
## Residual standard error: 5.861 on 13 degrees of freedom
## Multiple R-squared:  0.9205,    Adjusted R-squared:  0.9082 
## F-statistic: 75.22 on 2 and 13 DF,  p-value: 7.142e-08

Test slope of stage2 against 0 slope

library(car)

linearHypothesis(fm, "stage1 + stage2", verbose = TRUE)
## Hypothesis matrix:
##                 (Intercept) stage1 stage2
## stage1 + stage2           0      1      1
## 
## Right-hand-side vector:
## *rhs* 
##     0 
## 
## Estimated linear function (hypothesis.matrix %*% coef - rhs)
## stage1 + stage2 
##         2.03696 
## 
## 
## Estimated variance of linear function
## [1] 0.5848974
## 
## Linear hypothesis test
## 
## Hypothesis:
## stage1  + stage2 = 0
## 
## Model 1: restricted model
## Model 2: score ~ stage1 + stage2
## 
##   Res.Df    RSS Df Sum of Sq      F  Pr(>F)  
## 1     14 690.27                              
## 2     13 446.58  1    243.69 7.0939 0.01951 *
## ---
## Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Graphics

plot(score ~ year)
lines(fitted(fm) ~ year, col = "red")
abline(v = 2008, lty = 2)

screenshot

Note

The data used is the following and is from the original version of the question.

year <- c(2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015)
score <- c(35, 40, 45, 59, 62, 71, 78, 65, 78, 82, 89, 85, 78, 88, 96, 98)
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
  • Many thanks for solving my problem. I have anthter question: if I have many column in 'score', is it possible to calculate the fitted information in batch? – Skyer Jun 16 '23 at 01:07
  • The LHS of the formula in lm can be a matrix. – G. Grothendieck Jun 16 '23 at 12:56
  • score1 <- c(35, 40, 45, 59, 62, 71, 78, 65, 78, 82, 89, 85, 78, 88, 96, 98), score2 <- c(35, 40, 45, 59, 62, 71, 78, 65, 78, 82, 89, 85, 78, 88, 96, 98) – Skyer Jun 16 '23 at 14:25
  • Try `lm(cbind(score1, score2) ~ stage1 + stage2)` – G. Grothendieck Jun 16 '23 at 14:29
  • Thanks for the quick answer, I am still editing... Just now I have updated the data in the origional question. I want to to the fitting to score, I have tried lm(cbind(score1, score2) ~ stage1 + stage2), it got an error like 'Error in model.frame.default(formula = cbind(score1, score2) ~ stage1 + : variable lengths differ (found for 'stage1')' – Skyer Jun 16 '23 at 14:35
  • It does not if score1 and score2 are the vectors in the comment. – G. Grothendieck Jun 16 '23 at 14:43
  • I have a matrix which has 16 rows and 18 columns with a name called score. In the matrix, the data in each column will be used to do the fitting. I tried lm(score ~ stage1 + stage2), or lm(cbind(score) ~ stage1 + stage2). Neither of them can not work. Could you please give me some advice? Thanks a lot. – Skyer Jun 17 '23 at 03:18