Finding p-value and z statistics along with the OLS Linear regression

Question

I could find the coefficients and intercepts from linear regression but unable to find a suitable method to get p-value and z value for respective variable trend. Additionally, not able to find a method to save the output results in excel format. The data is here. There are 24 variables against time. I am not getting the z-statistics and p-values, additionally estimates are also incorrect by first method. where am i wrong?

library("trend")

# read ozone data (I converted to a text file first)
otm <- read.table("D:/data.txt",header=T)


#  make a data frame version
otm_df <- data.frame(otm)
markers <- sample(0:1, replace = T, size = 11)

# calculate OLS slope for all columns
# the -1 at end removes the intercepts
ols <- sapply(otm_df, function(x) coef(lm(markers ~ x))[-1])

I tried this method. I didn't get the z-statistics and could not save it to excel format.

library(reshape2)
DF <- reshape2::melt(otm, id.var = "Year")
library(broom); library(tidyverse)
ols <- DF %>% nest(data = -variable) %>% 
  mutate(model = map(data, ~lm(value ~ Year, data = .)), 
         tidied = map(model, tidy)) %>% 
  unnest(tidied)

#to save the results in excel format (not working here for me)
capture.output(summary(ols), file = "ols.csv" )
write.csv(ols, file.path('E:/',filename = "ols2.csv"), row.names = TRUE)

# A tibble: 48 x 8
   variable data              model  term         estimate std.error statistic p.value
   <fct>    <list>            <list> <chr>           <dbl>     <dbl>     <dbl>   <dbl>
 1 BanTES   <tibble [11 x 2]> <lm>   (Intercept) -236.       488.       -0.483   0.641
 2 BanTES   <tibble [11 x 2]> <lm>   Year           0.139      0.242     0.572   0.582
 3 SriTES   <tibble [11 x 2]> <lm>   (Intercept)  220.       351.        0.627   0.546
 4 SriTES   <tibble [11 x 2]> <lm>   Year          -0.0935     0.174    -0.536   0.605
 5 AfgTES   <tibble [11 x 2]> <lm>   (Intercept)  364.       444.        0.820   0.434
 6 AfgTES   <tibble [11 x 2]> <lm>   Year          -0.161      0.221    -0.730   0.484
 7 BhuTES   <tibble [11 x 2]> <lm>   (Intercept)  373.       831.        0.449   0.664
 8 BhuTES   <tibble [11 x 2]> <lm>   Year          -0.170      0.413    -0.412   0.690
 9 IndTES   <tibble [11 x 2]> <lm>   (Intercept) -342.       213.       -1.60    0.143
10 IndTES   <tibble [11 x 2]> <lm>   Year           0.190      0.106     1.80    0.106

summary(ols)

    variable  data.Length  data.Class  data.Mode model.Length  model.Class  model.Mode     term          
 BanTES : 2   2       tbl_df  list               12    lm    list                      Length:48         
 SriTES : 2   2       tbl_df  list               12    lm    list                      Class :character  
 AfgTES : 2   2       tbl_df  list               12    lm    list                      Mode  :character  
 BhuTES : 2   2       tbl_df  list               12    lm    list                                        
 IndTES : 2   2       tbl_df  list               12    lm    list                                        
 NepTES : 2   2       tbl_df  list               12    lm    list                                        
 (Other):36   2       tbl_df  list               12    lm    list

Any help will be useful. Thank you in advance !

There is (although of course it's a t-statistic and not a z-statistic, as it should be). — Roland, Oct 06 '21 at 15:01
I checked with the estimates with excel. The OLS slope values of excel and R software do not match. Why is that so? — Lalantra, Oct 06 '21 at 17:13
Are you sure you are using the same DV and IV in Excel? Beginners sometimes switch these. — Roland, Oct 08 '21 at 10:23
Please [provide a minimal reproducible example](https://stackoverflow.com/a/5963610/1412059). — Roland, Oct 11 '21 at 13:01
Do you mean to `map(model, tidy))`? You have `map(models, tidy))` — Robin Gertenbach, Oct 11 '21 at 13:09
@Robin sorry..updated.. it should be ```map(model, tidy))``` I need z-statistics instead of t-statistics. Additionally, how to get the values "coeff, z- stats, and p-value in excel format? ```capture.output(summary(ols), file = "ols.csv" )''' ``` and ```write.csv(ols, file.path('E:/',filename = "ols2.csv"), row.names = TRUE)``` is not helpful — Lalantra, Oct 11 '21 at 13:59

UseR10085 · Accepted Answer · 2021-10-16T16:22:26.630

Linear regression does not give you Z statistic as rightly commented by @Roland rather linear regression gives you t statistic. You can use the following code to save the coeff, t statistic, and p-value in excel format

library(tidyverse)
library(broom)
library(openxlsx)

# read ozone data (I converted to a text file first)
otm <- read.table("data.txt", header=T)
head(otm, 2)

otm %>% 
  pivot_longer(-c("Year")) %>%
  group_by(name) %>% 
  do(fitlm = tidy(lm(value ~ Year, data = ., na.action=na.omit))) %>% 
  unnest(fitlm) %>% 
  write.xlsx("Linear_trend_1.xlsx")

In the "Linear_trend_1.xlsx" file, Year row for each location provides you the slope and the statistic is t statistic.

Your values from the first method are not matching with the second method because the first method uses markers as the dependent variable and all the variables in the otm_df were used as the independent variables. In the second method, Year was used as independent variable while all the location values were the dependent variable.
Another thing, you are using sample function to create markers which will give you different outputs for every run. So you have to use set.seed to make it constant for every run like

set.seed(123)
otm %>% 
  mutate(markers = sample(0:1, replace = T, size = 11)) %>% 
  pivot_longer(-c("Year", "markers")) %>%
  group_by(name) %>% 
  do(fitlm = tidy(lm(markers ~ value, data = ., na.action=na.omit))) %>% 
  unnest(fitlm) %>% 
  write.xlsx("Linear_trend_2.xlsx")

# A tibble: 48 x 6
   name   term        estimate std.error statistic p.value
   <chr>  <chr>          <dbl>     <dbl>     <dbl>   <dbl>
 1 AfgERA (Intercept)  -8.58      7.79      -1.10    0.299
 2 AfgERA value         0.577     0.498      1.16    0.276
 3 AfgTES (Intercept)  -1.62      2.99      -0.542   0.601
 4 AfgTES value         0.0521    0.0750     0.695   0.505
 5 AfgTOM (Intercept) -25.3      19.5       -1.30    0.225
 6 AfgTOM value         1.94      1.46       1.33    0.218
 7 BanERA (Intercept)  -2.28      5.03      -0.453   0.662
 8 BanERA value         0.201     0.370      0.543   0.600
 9 BanTES (Intercept)  -3.42      2.80      -1.22    0.253
10 BanTES value         0.0892    0.0644     1.39    0.199
# ... with 38 more rows

If I modify your first approach like the following, you can see that the output from the above code and your code is basically same

#  make a data frame version
otm_df <- data.frame(otm)
set.seed(123) #To have same output from sample function for every run
markers <- sample(0:1, replace = T, size = 11)

# calculate OLS slope for all columns
# the -1 at end removes the intercepts
ols <- sapply(otm_df, function(x) coef(lm(markers ~ x))[-1])

Year.x     BanTES.x     SriTES.x     AfgTES.x     BhuTES.x     IndTES.x 
 0.054545455  0.089176876  0.092629314  0.052132553  0.001602003  0.107446434 
    NepTES.x     PakTES.x      SATES.x     BanERA.x     SriERA.x     AfgERA.x 
 0.065125607 -0.115108438  0.315928753  0.200712285 -0.519996739  0.577317012 
    BhuERA.x     IndERA.x     NepERA.x     PakERA.x      SAERA.x     BanTOM.x 
-0.140990921  0.110578204 -0.030546686  0.265909319  0.176797510  1.897234338 
    SriTOM.x     AfgTOM.x     BhuTOM.x     IndTOm.x     NepTOM.x     PakTOM.x 
 5.380610477  1.935170281  1.761172711  2.248531891  2.107380452  2.011356580 
     SATOM.x 
 2.214848959

Here is the data in dput format

dput(otm)
structure(list(Year = 2008:2018, BanTES = c(44.06247376, 43.81239107, 
40.68010622, 46.97760506, 37.49591135, 43.81239107, 43.81239107, 
43.81239107, 43.81239107, 45.27803189, 44.06247376), SriTES = c(32.07268265, 
35.01918477, 29.91018035, 34.1291577, 28.5258431, 32.07268265, 
32.07268265, 32.07268265, 32.07268265, 30.96753552, 32.07268265
), AfgTES = c(39.19328867, 42.06325898, 42.31015918, 40.54543762, 
34.28696385, 40.54543762, 40.54543762, 40.54543762, 40.54543762, 
37.38974643, 39.19328867), BhuTES = c(34.08241824, 36.95440954, 
30.41561338, 30.37004394, 19.8861367, 30.41561338, 30.41561338, 
30.41561338, 30.41561338, 32.09933763, 32.09933763), IndTES = c(40.05913352, 
41.54741392, 38.88957844, 42.47544504, 43.24350644, 41.54741392, 
41.54741392, 41.54741392, 41.54741392, 42.65820983, 42.47544504
), NepTES = c(38.12979871, 37.62785275, 34.40488247, 37.7995467, 
39.64286364, 37.7995467, 37.7995467, 37.7995467, 37.7995467, 
30.63632105, 37.7995467), PakTES = c(41.38388734, 41.99865359, 
42.16236093, 42.51838941, 43.4444952, 42.16236093, 42.16236093, 
42.16236093, 42.16236093, 44.96627251, 42.51838941), SATES = c(40.03077, 
41.52302, 39.6327, 41.9098, 41.11191, 41.11191, 41.11191, 41.11191, 
41.11191, 41.57009, 41.52302), BanERA = c(13.76686693, 13.904453, 
13.40584856, 13.45199721, 13.47657436, 12.8992102, 13.3586098, 
14.23223365, 13.4228729, 13.21487616, 14.50830571), SriERA = c(11.81852768, 
11.79187354, 11.51484349, 11.50552588, 11.489789, 11.23384852, 
10.61182708, 11.33951759, 11.6357584, 11.74685028, 12.14987906
), AfgERA = c(15.44115983, 15.425995, 15.6161623, 15.47751927, 
15.81748069, 15.47498417, 15.41748855, 16.06462541, 15.61143062, 
15.32810621, 16.39162424), BhuERA = c(14.34493453, 14.28085419, 
14.24728543, 14.03202106, 14.04152053, 13.42977221, 13.22665229, 
14.344052, 13.58792484, 13.28851619, 14.28029524), IndERA = c(14.08262362, 
14.11485037, 13.80713493, 13.86114379, 13.92607879, 13.37996473, 
13.45767152, 14.49365275, 13.88142768, 13.73986257, 14.77032404
), NepERA = c(14.93883379, 14.896056, 14.78607828, 14.50880606, 
14.69793299, 13.96309811, 14.18825383, 15.32530354, 14.38700954, 
13.98545482, 14.9828434), PakERA = c(14.89773191, 14.87337075, 
14.89837223, 14.76236826, 15.13918051, 14.61385609, 14.54589641, 
15.40150813, 14.8588883, 14.62185208, 15.6575491), SAERA = c(14.38877, 
14.40468, 14.22069, 14.20561, 14.35855, 13.8399, 13.88027, 14.83054, 
14.24554, 14.06201, 15.09615), BanTOM = c(9.317937851, 9.308046341, 
9.327401161, 9.319338799, 9.311285019, 9.300317764, 9.292790413, 
9.540946007, 9.04840374, 9.300317764, 9.317937851), SriTOM = c(5.437336445, 
5.436554909, 5.435492039, 5.440690994, 5.436693192, 5.440601349, 
5.427892685, 5.54946661, 5.427827358, 5.440601349, 5.437336445
), AfgTOM = c(13.31581736, 13.30339324, 13.30090284, 13.29781604, 
13.33800817, 13.31919873, 13.30073023, 13.62503445, 13.16488469, 
13.30073023, 13.31581736), BhuTOM = c(11.69911337, 11.67375898, 
11.71142626, 11.69099903, 11.68556881, 11.68714046, 11.65387106, 
11.97064924, 11.44872904, 11.67375898, 11.69099903), IndTOm = c(9.709311898, 
9.704142364, 9.703938368, 9.72520479, 9.709638531, 9.708799697, 
9.690851817, 9.952517961, 9.499369441, 9.704142364, 9.709638531
), NepTOM = c(12.45835066, 12.43677187, 12.48850822, 12.49002218, 
12.46283817, 12.50376368, 12.44072294, 12.78685617, 12.27891684, 
12.44072294, 12.46283817), PakTOM = c(12.37911913, 12.38028261, 
12.37067625, 12.38315158, 12.38352468, 12.36856567, 12.37349086, 
12.67422019, 12.18962786, 12.37349086, 12.38315158), SATOM = c(10.63543, 
10.62967, 10.62981, 10.64489, 10.63941, 10.63525, 10.6195, 10.89613, 
10.44028, 10.62967, 10.63941)), class = "data.frame", row.names = c(NA, 
-11L))

I was just checking the results and it seems first method gives the correct result of OLS and not the second method. The slope value of BanTES from 1st method is 0.138595178 (correct) whereas 2nd method gives a values of 0.089176876 which is incorrect. — Lalantra, Oct 22 '21 at 07:24
Actually, the first method is correct. But to keep a similarity with your question I have shown the second method. You can take the output according to your need. — UseR10085, Oct 22 '21 at 10:05

score 2 · Answer 2 · edited Oct 16 '21 at 02:02

You can use glance instead of tidy for the p-value:

ols <- DF %>% 
  nest(data = -variable) %>% 
  mutate(model  = map(data, ~lm(value ~ Year, data = .)), 
         tidied = map(model, glance)) %>% 
  unnest(tidied)

> ols
# A tibble: 24 x 15
   variable data              model  r.squared adj.r.squared sigma statistic p.value    df logLik   AIC   BIC deviance df.residual  nobs
   <fct>    <list>            <list>     <dbl>         <dbl> <dbl>     <dbl>   <dbl> <dbl>  <dbl> <dbl> <dbl>    <dbl>       <int> <int>
 1 BanTES   <tibble [11 x 2]> <lm>     0.0350        -0.0722 2.54     0.327    0.582     1 -24.8   55.5  56.7    58.2            9    11
 2 SriTES   <tibble [11 x 2]> <lm>     0.0309        -0.0767 1.83     0.287    0.605     1 -21.2   48.3  49.5    30.1            9    11
 3 AfgTES   <tibble [11 x 2]> <lm>     0.0559        -0.0490 2.32     0.533    0.484     1 -23.7   53.5  54.7    48.2            9    11
 4 BhuTES   <tibble [11 x 2]> <lm>     0.0185        -0.0905 4.33     0.170    0.690     1 -30.6   67.3  68.4   169.             9    11
 5 IndTES   <tibble [11 x 2]> <lm>     0.264          0.183  1.11     3.23     0.106     1 -15.7   37.3  38.5    11.1            9    11
 6 NepTES   <tibble [11 x 2]> <lm>     0.0689        -0.0345 2.49     0.666    0.435     1 -24.5   55.0  56.2    55.6            9    11
 7 PakTES   <tibble [11 x 2]> <lm>     0.243          0.159  0.872    2.89     0.123     1 -13.0   32.0  33.2     6.84           9    11
 8 SATES    <tibble [11 x 2]> <lm>     0.221          0.135  0.625    2.56     0.144     1  -9.34  24.7  25.9     3.52           9    11
 9 BanERA   <tibble [11 x 2]> <lm>     0.0252        -0.0831 0.482    0.233    0.641     1  -6.49  19.0  20.2     2.09           9    11
10 SriERA   <tibble [11 x 2]> <lm>     0.00230       -0.109  0.416    0.0208   0.889     1  -4.87  15.7  16.9     1.56           9    11

Thanks but where is the slope values? what statistic value is it? Is it Z statistic or t statistic? If it's t statistic then why the values you estimated dont match the one I derived? how to save the results in excel format with coeff, z statistics, and p value? — Lalantra, Oct 13 '21 at 03:38
@Bappa Das and @Roland answered the Z-stat and t-stat question. For the slope and intercept, you can go back to `tidy` instead of `glance`. And to save the results, I usually prefer the `write.csv2` function. — Mata, Oct 14 '21 at 07:43

score 2 · Answer 3 · answered Oct 13 '21 at 21:46

Regarding saving the data to Excel, if you have your data in a tibble or data.frame, then you can use the function:

readr::write_csv to save the data as a CSV file, which can be easily opened in Excel;
writexl::write_xlsx to save the data as a native Excel file.

Either function will save all rows and columns in your data to a file.

Finding p-value and z statistics along with the OLS Linear regression

3 Answers3