How to simplify code in R (normality test): different sample sizes in 1 line or 2 lines of code?

Question

I want to conduct normality tests a little bit cleaner in my coding and do a simulation (repeat the test 1000 times).

sample <- c(10,30,50,100,500)
shapiro.test(rnorm(sample))

    Shapiro-Wilk normality test

data:  rnorm(sample)
W = 0.90644, p-value = 0.4465

This only gives one output as you can observe above. How do I get 5 outputs? Is there something I am missing here..?

Using the replicate function gives me 1000 statistics per sample size, while I am only interested in the p-values and relate them to a significance level. In the coding of the individual normality tests, I used the following code (thanks to user StupidWolf, in my previous posted questions on stackoverflow)

replicate_sw10 = replicate(1000,shapiro.test(rnorm(10)))
table(replicate_sw10["p.value",]<0.10)/1000
#which gave the following output
> FALSE  TRUE 
> 0.896 0.104

`lapply(sample, function (x) shapiro.test(rnorm(x)))` – Matt Mar 15 '20 at 13:03 — Matt, Mar 15 '20 at 13:03
Thanks Matt for your answer! – FinancialRiskManagerBE Mar 16 '20 at 19:11 — FinancialRiskManagerBE, Mar 16 '20 at 19:11

Annet · Answer 1 · 2020-03-15T14:03:13.397

using the purrr package

map(sample, function(x) shapiro.test(rnorm(x)))

which gives

[[1]]

    Shapiro-Wilk normality test

data:  rnorm(x)
W = 0.92567, p-value = 0.4067


[[2]]

    Shapiro-Wilk normality test

data:  rnorm(x)
W = 0.95621, p-value = 0.247


[[3]]

    Shapiro-Wilk normality test

data:  rnorm(x)
W = 0.96144, p-value = 0.1021


[[4]]

    Shapiro-Wilk normality test

data:  rnorm(x)
W = 0.98654, p-value = 0.4077


[[5]]

    Shapiro-Wilk normality test

data:  rnorm(x)
W = 0.99597, p-value = 0.2324

Edit: so after your edit you are requesting some table. This doesn't work in the way you are doing it with your replicate_sw10 example as that is a matrix, while map (or lapply for that matter) results in a list. So again you want to use apply or map to do the same transformations on all the parts of the list.

replicate_swall  <- map(sample, function(x) shapiro.test(rnorm(x)))

replicate_pvalue_extract <- map(replicate_swall  , function(x) x["p.value",]) %>% unlist(., recursive = F)

table(replicate_pvalue_extract  < 0.10) / length(replicate_pvalue_extract )

This will give you:

FALSE  TRUE 
0.896 0.104

Another option is using the magrittr package for the extract. Your code will than look like

replicate_pvalue_extract <- map(replicate_swall, magrittr::extract, "p.value") %>% unlist(., recursive = F)

table(replicate_pvalue_extract  < 0.10) / length(replicate_pvalue_extract )

In the code above I assumed that you wanted to divide your table by all replicates and that it doesn't matter what the input was (with input I mean 10,30,50,100, or 500) . If you do care about the input you can keep them separate, I will give the code below. Also note that I used length rather than your hardcoded /1000. In this way your code is way more generic, if you change the replicate number the number you divide your table with automatically changes as well. Otherwise you have to make the changes on multiple locations (especially if someone else uses your code) which could easily result in mistakes.

replicate_pvalue_extract <- map(replicate_swall  , function(x) x["p.value",]) 

map(replicate_pvalue_extract  , function(x) table(x < 0.10) / length(x))

Or you can combine them:

map(map(replicate_swall, function(x) x["p.value",]), function(x) table(x < 0.10) / length(x))

This is why I gave you the magrittr option, as I do not like the function(x) twice. With magrittr it would look like:

map(map(replicate_swall, magrittr::extract, "p.value"), function(x) table(x < 0.10) / length(x))

which would result in:

[[1]]

FALSE  TRUE 
0.896 0.104 

[[2]]

FALSE  TRUE 
0.889 0.111 

[[3]]

FALSE  TRUE 
0.904 0.096 

[[4]]

FALSE  TRUE 
  0.9   0.1 

[[5]]

FALSE  TRUE 
0.891 0.109

it gives an error when I try to replicate the Shapiro Wilk test 1000 times for the different sample sizes using this function. Any way to solve this? `lapply(sample, function(x) replicate(1000,shapiro.test(rnorm(x))))` — FinancialRiskManagerBE, Mar 15 '20 at 13:27
@StudentUantwerpen It doesn't give an error for me, neither with the map or the lapply — Annet, Mar 15 '20 at 13:33
It just gave an output 1 minute before I modified my question! — FinancialRiskManagerBE, Mar 15 '20 at 13:36
@StudentUantwerpen you are unclear. Where is the error originating? Because my guess is it is in the table part, rather than the map/lapply part, as you cannot use the same table argument as you used on replicate_sw10. After all, replicate_sw10 is a matrix, saving the lapply or map will give you a list with the matrixes — Annet, Mar 15 '20 at 13:40
@GeorgeSavva I am unsure whether it makes a difference here. It just a personal preference in this case as I often use map, map2, pmap etc. So it just my go to function rather than the apply. [This question, however, gives a nice overview when to use map or lapply](https://stackoverflow.com/questions/45101045/why-use-purrrmap-instead-of-lapply). — Annet, Mar 15 '20 at 13:45

score 2 · Accepted Answer · answered Mar 15 '20 at 14:28

You may simply use $p.value. The code below yields a matrix with 1,000 rows for the repetitions, and 5 columns for the smpl sizes. If you want a list as result, just use lapply instead of sapply.

smpl <- c(10, 30, 50, 100, 500)

set.seed(42)  ## for sake of reproducibility

res <- sapply(smpl, function(x) replicate(1e3, shapiro.test(rnorm(x))$p.value))
head(res)
#            [,1]      [,2]       [,3]      [,4]      [,5]
# [1,] 0.43524553 0.5624891 0.02116901 0.8972087 0.8010757
# [2,] 0.67500688 0.1417968 0.03722656 0.7614192 0.7559309
# [3,] 0.52777713 0.6728819 0.67880178 0.1455375 0.7734797
# [4,] 0.55618980 0.1736095 0.69879316 0.4950400 0.5181642
# [5,] 0.93774782 0.9077292 0.58930787 0.2687687 0.8435223
# [6,] 0.01444456 0.1214157 0.07042380 0.4479121 0.7982574

Very simple indeed! Thanks! – FinancialRiskManagerBE Mar 16 '20 at 19:11 — FinancialRiskManagerBE, Mar 16 '20 at 19:11

How to simplify code in R (normality test): different sample sizes in 1 line or 2 lines of code?

2 Answers2