3

I could find the coefficients and intercepts from linear regression but unable to find a suitable method to get p-value and z value for respective variable trend. Additionally, not able to find a method to save the output results in excel format. The data is here. There are 24 variables against time. I am not getting the z-statistics and p-values, additionally estimates are also incorrect by first method. where am i wrong?

library("trend")

# read ozone data (I converted to a text file first)
otm <- read.table("D:/data.txt",header=T)


#  make a data frame version
otm_df <- data.frame(otm)
markers <- sample(0:1, replace = T, size = 11)

# calculate OLS slope for all columns
# the -1 at end removes the intercepts
ols <- sapply(otm_df, function(x) coef(lm(markers ~ x))[-1])

I tried this method. I didn't get the z-statistics and could not save it to excel format.

library(reshape2)
DF <- reshape2::melt(otm, id.var = "Year")
library(broom); library(tidyverse)
ols <- DF %>% nest(data = -variable) %>% 
  mutate(model = map(data, ~lm(value ~ Year, data = .)), 
         tidied = map(model, tidy)) %>% 
  unnest(tidied)

#to save the results in excel format (not working here for me)
capture.output(summary(ols), file = "ols.csv" )
write.csv(ols, file.path('E:/',filename = "ols2.csv"), row.names = TRUE) 
# A tibble: 48 x 8
   variable data              model  term         estimate std.error statistic p.value
   <fct>    <list>            <list> <chr>           <dbl>     <dbl>     <dbl>   <dbl>
 1 BanTES   <tibble [11 x 2]> <lm>   (Intercept) -236.       488.       -0.483   0.641
 2 BanTES   <tibble [11 x 2]> <lm>   Year           0.139      0.242     0.572   0.582
 3 SriTES   <tibble [11 x 2]> <lm>   (Intercept)  220.       351.        0.627   0.546
 4 SriTES   <tibble [11 x 2]> <lm>   Year          -0.0935     0.174    -0.536   0.605
 5 AfgTES   <tibble [11 x 2]> <lm>   (Intercept)  364.       444.        0.820   0.434
 6 AfgTES   <tibble [11 x 2]> <lm>   Year          -0.161      0.221    -0.730   0.484
 7 BhuTES   <tibble [11 x 2]> <lm>   (Intercept)  373.       831.        0.449   0.664
 8 BhuTES   <tibble [11 x 2]> <lm>   Year          -0.170      0.413    -0.412   0.690
 9 IndTES   <tibble [11 x 2]> <lm>   (Intercept) -342.       213.       -1.60    0.143
10 IndTES   <tibble [11 x 2]> <lm>   Year           0.190      0.106     1.80    0.106 
summary(ols)
    variable  data.Length  data.Class  data.Mode model.Length  model.Class  model.Mode     term          
 BanTES : 2   2       tbl_df  list               12    lm    list                      Length:48         
 SriTES : 2   2       tbl_df  list               12    lm    list                      Class :character  
 AfgTES : 2   2       tbl_df  list               12    lm    list                      Mode  :character  
 BhuTES : 2   2       tbl_df  list               12    lm    list                                        
 IndTES : 2   2       tbl_df  list               12    lm    list                                        
 NepTES : 2   2       tbl_df  list               12    lm    list                                        
 (Other):36   2       tbl_df  list               12    lm    list  

Any help will be useful. Thank you in advance !

UseR10085
  • 7,120
  • 3
  • 24
  • 54
Lalantra
  • 67
  • 1
  • 11

3 Answers3

3

Linear regression does not give you Z statistic as rightly commented by @Roland rather linear regression gives you t statistic. You can use the following code to save the coeff, t statistic, and p-value in excel format

library(tidyverse)
library(broom)
library(openxlsx)

# read ozone data (I converted to a text file first)
otm <- read.table("data.txt", header=T)
head(otm, 2)

otm %>% 
  pivot_longer(-c("Year")) %>%
  group_by(name) %>% 
  do(fitlm = tidy(lm(value ~ Year, data = ., na.action=na.omit))) %>% 
  unnest(fitlm) %>% 
  write.xlsx("Linear_trend_1.xlsx")

In the "Linear_trend_1.xlsx" file, Year row for each location provides you the slope and the statistic is t statistic.

Your values from the first method are not matching with the second method because the first method uses markers as the dependent variable and all the variables in the otm_df were used as the independent variables. In the second method, Year was used as independent variable while all the location values were the dependent variable.
Another thing, you are using sample function to create markers which will give you different outputs for every run. So you have to use set.seed to make it constant for every run like

set.seed(123)
otm %>% 
  mutate(markers = sample(0:1, replace = T, size = 11)) %>% 
  pivot_longer(-c("Year", "markers")) %>%
  group_by(name) %>% 
  do(fitlm = tidy(lm(markers ~ value, data = ., na.action=na.omit))) %>% 
  unnest(fitlm) %>% 
  write.xlsx("Linear_trend_2.xlsx")
# A tibble: 48 x 6
   name   term        estimate std.error statistic p.value
   <chr>  <chr>          <dbl>     <dbl>     <dbl>   <dbl>
 1 AfgERA (Intercept)  -8.58      7.79      -1.10    0.299
 2 AfgERA value         0.577     0.498      1.16    0.276
 3 AfgTES (Intercept)  -1.62      2.99      -0.542   0.601
 4 AfgTES value         0.0521    0.0750     0.695   0.505
 5 AfgTOM (Intercept) -25.3      19.5       -1.30    0.225
 6 AfgTOM value         1.94      1.46       1.33    0.218
 7 BanERA (Intercept)  -2.28      5.03      -0.453   0.662
 8 BanERA value         0.201     0.370      0.543   0.600
 9 BanTES (Intercept)  -3.42      2.80      -1.22    0.253
10 BanTES value         0.0892    0.0644     1.39    0.199
# ... with 38 more rows

If I modify your first approach like the following, you can see that the output from the above code and your code is basically same

#  make a data frame version
otm_df <- data.frame(otm)
set.seed(123) #To have same output from sample function for every run
markers <- sample(0:1, replace = T, size = 11)

# calculate OLS slope for all columns
# the -1 at end removes the intercepts
ols <- sapply(otm_df, function(x) coef(lm(markers ~ x))[-1])
Year.x     BanTES.x     SriTES.x     AfgTES.x     BhuTES.x     IndTES.x 
 0.054545455  0.089176876  0.092629314  0.052132553  0.001602003  0.107446434 
    NepTES.x     PakTES.x      SATES.x     BanERA.x     SriERA.x     AfgERA.x 
 0.065125607 -0.115108438  0.315928753  0.200712285 -0.519996739  0.577317012 
    BhuERA.x     IndERA.x     NepERA.x     PakERA.x      SAERA.x     BanTOM.x 
-0.140990921  0.110578204 -0.030546686  0.265909319  0.176797510  1.897234338 
    SriTOM.x     AfgTOM.x     BhuTOM.x     IndTOm.x     NepTOM.x     PakTOM.x 
 5.380610477  1.935170281  1.761172711  2.248531891  2.107380452  2.011356580 
     SATOM.x 
 2.214848959 

Here is the data in dput format

dput(otm)
structure(list(Year = 2008:2018, BanTES = c(44.06247376, 43.81239107, 
40.68010622, 46.97760506, 37.49591135, 43.81239107, 43.81239107, 
43.81239107, 43.81239107, 45.27803189, 44.06247376), SriTES = c(32.07268265, 
35.01918477, 29.91018035, 34.1291577, 28.5258431, 32.07268265, 
32.07268265, 32.07268265, 32.07268265, 30.96753552, 32.07268265
), AfgTES = c(39.19328867, 42.06325898, 42.31015918, 40.54543762, 
34.28696385, 40.54543762, 40.54543762, 40.54543762, 40.54543762, 
37.38974643, 39.19328867), BhuTES = c(34.08241824, 36.95440954, 
30.41561338, 30.37004394, 19.8861367, 30.41561338, 30.41561338, 
30.41561338, 30.41561338, 32.09933763, 32.09933763), IndTES = c(40.05913352, 
41.54741392, 38.88957844, 42.47544504, 43.24350644, 41.54741392, 
41.54741392, 41.54741392, 41.54741392, 42.65820983, 42.47544504
), NepTES = c(38.12979871, 37.62785275, 34.40488247, 37.7995467, 
39.64286364, 37.7995467, 37.7995467, 37.7995467, 37.7995467, 
30.63632105, 37.7995467), PakTES = c(41.38388734, 41.99865359, 
42.16236093, 42.51838941, 43.4444952, 42.16236093, 42.16236093, 
42.16236093, 42.16236093, 44.96627251, 42.51838941), SATES = c(40.03077, 
41.52302, 39.6327, 41.9098, 41.11191, 41.11191, 41.11191, 41.11191, 
41.11191, 41.57009, 41.52302), BanERA = c(13.76686693, 13.904453, 
13.40584856, 13.45199721, 13.47657436, 12.8992102, 13.3586098, 
14.23223365, 13.4228729, 13.21487616, 14.50830571), SriERA = c(11.81852768, 
11.79187354, 11.51484349, 11.50552588, 11.489789, 11.23384852, 
10.61182708, 11.33951759, 11.6357584, 11.74685028, 12.14987906
), AfgERA = c(15.44115983, 15.425995, 15.6161623, 15.47751927, 
15.81748069, 15.47498417, 15.41748855, 16.06462541, 15.61143062, 
15.32810621, 16.39162424), BhuERA = c(14.34493453, 14.28085419, 
14.24728543, 14.03202106, 14.04152053, 13.42977221, 13.22665229, 
14.344052, 13.58792484, 13.28851619, 14.28029524), IndERA = c(14.08262362, 
14.11485037, 13.80713493, 13.86114379, 13.92607879, 13.37996473, 
13.45767152, 14.49365275, 13.88142768, 13.73986257, 14.77032404
), NepERA = c(14.93883379, 14.896056, 14.78607828, 14.50880606, 
14.69793299, 13.96309811, 14.18825383, 15.32530354, 14.38700954, 
13.98545482, 14.9828434), PakERA = c(14.89773191, 14.87337075, 
14.89837223, 14.76236826, 15.13918051, 14.61385609, 14.54589641, 
15.40150813, 14.8588883, 14.62185208, 15.6575491), SAERA = c(14.38877, 
14.40468, 14.22069, 14.20561, 14.35855, 13.8399, 13.88027, 14.83054, 
14.24554, 14.06201, 15.09615), BanTOM = c(9.317937851, 9.308046341, 
9.327401161, 9.319338799, 9.311285019, 9.300317764, 9.292790413, 
9.540946007, 9.04840374, 9.300317764, 9.317937851), SriTOM = c(5.437336445, 
5.436554909, 5.435492039, 5.440690994, 5.436693192, 5.440601349, 
5.427892685, 5.54946661, 5.427827358, 5.440601349, 5.437336445
), AfgTOM = c(13.31581736, 13.30339324, 13.30090284, 13.29781604, 
13.33800817, 13.31919873, 13.30073023, 13.62503445, 13.16488469, 
13.30073023, 13.31581736), BhuTOM = c(11.69911337, 11.67375898, 
11.71142626, 11.69099903, 11.68556881, 11.68714046, 11.65387106, 
11.97064924, 11.44872904, 11.67375898, 11.69099903), IndTOm = c(9.709311898, 
9.704142364, 9.703938368, 9.72520479, 9.709638531, 9.708799697, 
9.690851817, 9.952517961, 9.499369441, 9.704142364, 9.709638531
), NepTOM = c(12.45835066, 12.43677187, 12.48850822, 12.49002218, 
12.46283817, 12.50376368, 12.44072294, 12.78685617, 12.27891684, 
12.44072294, 12.46283817), PakTOM = c(12.37911913, 12.38028261, 
12.37067625, 12.38315158, 12.38352468, 12.36856567, 12.37349086, 
12.67422019, 12.18962786, 12.37349086, 12.38315158), SATOM = c(10.63543, 
10.62967, 10.62981, 10.64489, 10.63941, 10.63525, 10.6195, 10.89613, 
10.44028, 10.62967, 10.63941)), class = "data.frame", row.names = c(NA, 
-11L))
UseR10085
  • 7,120
  • 3
  • 24
  • 54
  • Thanx all of you – Lalantra Oct 17 '21 at 06:42
  • I was just checking the results and it seems first method gives the correct result of OLS and not the second method. The slope value of BanTES from 1st method is 0.138595178 (correct) whereas 2nd method gives a values of 0.089176876 which is incorrect. – Lalantra Oct 22 '21 at 07:24
  • Actually, the first method is correct. But to keep a similarity with your question I have shown the second method. You can take the output according to your need. – UseR10085 Oct 22 '21 at 10:05
2

You can use glance instead of tidy for the p-value:

ols <- DF %>% 
  nest(data = -variable) %>% 
  mutate(model  = map(data, ~lm(value ~ Year, data = .)), 
         tidied = map(model, glance)) %>% 
  unnest(tidied)

> ols
# A tibble: 24 x 15
   variable data              model  r.squared adj.r.squared sigma statistic p.value    df logLik   AIC   BIC deviance df.residual  nobs
   <fct>    <list>            <list>     <dbl>         <dbl> <dbl>     <dbl>   <dbl> <dbl>  <dbl> <dbl> <dbl>    <dbl>       <int> <int>
 1 BanTES   <tibble [11 x 2]> <lm>     0.0350        -0.0722 2.54     0.327    0.582     1 -24.8   55.5  56.7    58.2            9    11
 2 SriTES   <tibble [11 x 2]> <lm>     0.0309        -0.0767 1.83     0.287    0.605     1 -21.2   48.3  49.5    30.1            9    11
 3 AfgTES   <tibble [11 x 2]> <lm>     0.0559        -0.0490 2.32     0.533    0.484     1 -23.7   53.5  54.7    48.2            9    11
 4 BhuTES   <tibble [11 x 2]> <lm>     0.0185        -0.0905 4.33     0.170    0.690     1 -30.6   67.3  68.4   169.             9    11
 5 IndTES   <tibble [11 x 2]> <lm>     0.264          0.183  1.11     3.23     0.106     1 -15.7   37.3  38.5    11.1            9    11
 6 NepTES   <tibble [11 x 2]> <lm>     0.0689        -0.0345 2.49     0.666    0.435     1 -24.5   55.0  56.2    55.6            9    11
 7 PakTES   <tibble [11 x 2]> <lm>     0.243          0.159  0.872    2.89     0.123     1 -13.0   32.0  33.2     6.84           9    11
 8 SATES    <tibble [11 x 2]> <lm>     0.221          0.135  0.625    2.56     0.144     1  -9.34  24.7  25.9     3.52           9    11
 9 BanERA   <tibble [11 x 2]> <lm>     0.0252        -0.0831 0.482    0.233    0.641     1  -6.49  19.0  20.2     2.09           9    11
10 SriERA   <tibble [11 x 2]> <lm>     0.00230       -0.109  0.416    0.0208   0.889     1  -4.87  15.7  16.9     1.56           9    11
Rich Pauloo
  • 7,734
  • 4
  • 37
  • 69
Mata
  • 538
  • 3
  • 17
  • Thanks but where is the slope values? what statistic value is it? Is it Z statistic or t statistic? If it's t statistic then why the values you estimated dont match the one I derived? how to save the results in excel format with coeff, z statistics, and p value? – Lalantra Oct 13 '21 at 03:38
  • 1
    @Bappa Das and @Roland answered the Z-stat and t-stat question. For the slope and intercept, you can go back to `tidy` instead of `glance`. And to save the results, I usually prefer the `write.csv2` function. – Mata Oct 14 '21 at 07:43
2

Regarding saving the data to Excel, if you have your data in a tibble or data.frame, then you can use the function:

  • readr::write_csv to save the data as a CSV file, which can be easily opened in Excel;
  • writexl::write_xlsx to save the data as a native Excel file.

Either function will save all rows and columns in your data to a file.

Guilherme Salomé
  • 1,899
  • 4
  • 19
  • 39