0

My aims of this simulation is to evaluate the type 1 error rate of the tests under several combination of factors.

  1. sample sizes-(10,10),(10,25),(25,25),(25,50),(25,100),50,25),(50,100), (100,25),(100,100)

  2. standard deviation ratio- (1.00, 1.50, 2.00, 2.50, 3.00 and 3.50)

  3. distribution of gamma distribution with unequal skewness and equal skewness

The 2 sample test involved are pooled variance t test and welch t test and mann whitney test. I tried to modified a code by using the above combination of factors.

########################################
    #for normal distribution setup

# to ensure the reproducity of the result 
#(here we declare the random seed generator) 
set.seed(1)

## Put the samples sizes into matrix then use a loop for sample sizes
 sample_sizes<-matrix(c(10,10,10,25,25,25,25,50,25,100,50,25,50,100,100,25,100,100),
 nrow=2)

#create vector to combine all std deviations
sds<-matrix(c(4,4,6,4,8,4,10,4,12,4,14,4),nrow=2)

sd1<-c(4,6,8,10,12)
sd2<-c(4,4,4,4,4)
sds2<-rep(sd2,each=9)

##(use expand.grid)to create a data frame from combination of data
ss_sds1<- expand.grid(sample_sizes[2,], sd1)

#create a matrix combining the fifty four cases of combination of ss and sds
all_combine <- cbind(rep(sample_sizes[1,], 5), ss_sds1,sds2)

# name the column by sample samples 1 and 2 and standard deviation
colnames(all_combine) <- c("m", "n", "sds1","sds2")

#number of simulations 
nSims<-10000

#set significance level,alpha for the whole simulation
alpha<-0.05       

#set up matrix for storing data from simulation
#set nrow =nsims because wan storing every p-value simulated
matrix1_equal  <-matrix(0,nrow=nSims,ncol=9)
matrix4_unequal<-matrix(0,nrow=nSims,ncol=9)
matrix7_mann   <-matrix(0,nrow=nSims,ncol=9)

#set up vector for storing data from the three tests (nrow for all_combine=45)
equal1  <- unequal4<- mann7 <- rep(0, nrow(all_combine))

  # this loop steps through the all_combine matrix
  for(ss in 1:nrow(all_combine))  
  {
   #generate samples from the first column and second column
    m<-all_combine[ss,1]
    n<-all_combine[ss,2]   

      for (sim in 1:nSims)
      {
      #generate random samples from 2 normal distribution
      x<-rnorm(m,5,all_combine[ss,3])
      y<-rnorm(n,5,4)

      #extract p-value out and store every p-value into matrix
      matrix1_equal[sim,1]<-t.test(x,y,var.equal=TRUE)$p.value    
      matrix4_unequal[sim,4]<-t.test(x,y,var.equal=FALSE)$p.value 
      matrix7_mann[sim,7] <-wilcox.test(x,y)$p.value 
       }

     ##store the result
     equal1[ss]<- mean(matrix1_equal[,1]<=alpha)
     unequal4[ss]<-mean(matrix4_unequal[,4]<=alpha)
     mann7[ss]<- mean(matrix7_mann[,7]<=alpha)
  }

   # combine results
    nresult <- cbind(all_combine, equal1, unequal4, mann7)

    save.image(file="normal.data")

I am new in R , now i have completed a code in normal distribution and have to add on two more simulation on distribution of gamma distribution by using if else...can anyone pls give some advice how to change from normal distr. to gamma distr? I am stucking in this part right now...

HELP!! the code above gave me result 0.00 for several times, i check them for many times already and yet i did not spot any mistake. Please

Cœur
  • 37,241
  • 25
  • 195
  • 267
j.l
  • 25
  • 8
  • What is `nsds`, and how did you create `t_equal`, `t_unequal`, and `mann`. They appear to be vectors that and you are writing over the matrices you want to store them in with every simulation. – mfidino Apr 06 '16 at 13:24
  • I am sorry , nsds is the length of standard deviation ratio. And t_equal ,t_unequal and mann are vectors which the data are being simulated and stored into them... – j.l Apr 06 '16 at 13:32
  • I just edited the rest of the code into my questions....thanks for commenting... – j.l Apr 06 '16 at 13:34

2 Answers2

1

This is my current coding..

 ########################################
    #for normal distribution setup

# to ensure the reproducity of the result 
#(here we declare the random seed generator) 
set.seed(1)

## Put the samples sizes into matrix then use a loop for sample sizes
 sample_sizes<-matrix(c(10,10,10,25,25,25,25,50,25,100,50,25,50,100,100,25,100,100),
 nrow=2)

#create vector to combine all std deviations
sds<-matrix(c(4,4,6,4,8,4,10,4,12,4,14,4),nrow=2)

sd1<-c(4,6,8,10,12)
sd2<-c(4,4,4,4,4)
sds2<-rep(sd2,each=9)

##(use expand.grid)to create a data frame from combination of data
ss_sds1<- expand.grid(sample_sizes[2,], sd1)

#create a matrix combining the fifty four cases of combination of ss and sds
all_combine <- cbind(rep(sample_sizes[1,], 5), ss_sds1,sds2)

# name the column by sample samples 1 and 2 and standard deviation
colnames(all_combine) <- c("m", "n", "sds1","sds2")

#number of simulations 
nSims<-10000

#set significance level,alpha for the whole simulation
alpha<-0.05       

#set up matrix for storing data from simulation
#set nrow =nsims because wan storing every p-value simulated
matrix1_equal  <-matrix(0,nrow=nSims,ncol=9)
matrix4_unequal<-matrix(0,nrow=nSims,ncol=9)
matrix7_mann   <-matrix(0,nrow=nSims,ncol=9)

#set up vector for storing data from the three tests (nrow for all_combine=45)
equal1  <- unequal4<- mann7 <- rep(0, nrow(all_combine))

  # this loop steps through the all_combine matrix
  for(ss in 1:nrow(all_combine))  
  {
   #generate samples from the first column and second column
    m<-all_combine[ss,1]
    n<-all_combine[ss,2]   

      for (sim in 1:nSims)
      {
      #generate random samples from 2 normal distribution
      x<-rnorm(m,5,all_combine[ss,3])
      y<-rnorm(n,5,4)

      #extract p-value out and store every p-value into matrix
      matrix1_equal[sim,1]<-t.test(x,y,var.equal=TRUE)$p.value    
      matrix4_unequal[sim,4]<-t.test(x,y,var.equal=FALSE)$p.value 
      matrix7_mann[sim,7] <-wilcox.test(x,y)$p.value 
       }

     ##store the result
     equal1[ss]<- mean(matrix1_equal[,1]<=alpha)
     unequal4[ss]<-mean(matrix4_unequal[,4]<=alpha)
     mann7[ss]<- mean(matrix7_mann[,7]<=alpha)
  }

   # combine results
    nresult <- cbind(all_combine, equal1, unequal4, mann7)

    save.image(file="normal.data")
j.l
  • 25
  • 8
0

I edited your code to test for type 1 errors. Instead of having multiple nested for loops for each combination of factors I prefer to put all of those combinations into a single matrix and do simulations with each row of said matrix. This makes it much easier to plot out the results. To speed up computation, note that I did far fewer simualations (I changed nSims), and you would want to change it back. At the end you could then combine your three results matrix to the different combinations of factors.

I have no clue what you had going on with (**ss-1)*nsds+sim** and opted to change it.

#for normal distribution setup

## Put the samples sizes into matrix then use a loop for sample sizes
 sample_sizes<-
  matrix(c(10,10,10,25,25,25,25,50,25,100,50,25,50,100,100,25,100,100),
     nrow=2)

#create vector to combine all std deviations
sds<-c(4,6,8,10,12,14)

# get all combinations with one row of the sample_sizes matrix
all_combn <- expand.grid(sample_sizes[2,], sds)

# tack on the first row

all_combn <- cbind(rep(sample_sizes[1,], 6), all_combn)
# change the column names
colnames(all_combn) <- c("ss1", "ss2", "sds")


# to ensure the reproducity of the result 
#(here we declare the random seed generator) 
set.seed(1)

#number of simulations 
nSims<-500    

# to store your simulations for the three tests
store_sim <- matrix(0, nrow = nSims, ncol = 3)
#set significance level,alpha for the whole simulatio
alpha<-0.05       


#set up vector for storing data from the three tests

equal  <- unequal<- mann <- rep(0, nrow(all_combn))


# outer loop run nsims for every combinations of std deviations and ss


  # this loop steps through the all_combn matrix
  for(ss in 1:nrow(all_combn))  
  {
    m<-all_combn[ss,1]
    n<-all_combn[ss,2]   

      for (sim in 1:nSims)
      {
      #generate random samples from 2 normal distribution
      x<-rnorm(m,5,all_combn[ss,3])
      y<-rnorm(n,5,4)

      #extract p-value out and store it in vectors
      store_sim[sim,1]<-t.test(x,y,var.equal=TRUE)$p.value    
      store_sim[sim,2]<-t.test(x,y,var.equal=FALSE)$p.value 
      store_sim[sim,3] <-wilcox.test(x,y)$p.value 

    }

  ##store the result into matrix defined before
  equal[ss]<- sum(store_sim[,1]<alpha)/nSims
  unequal[ss]<- sum(store_sim[,2]<alpha)/nSims
  mann[ss]<- sum(store_sim[,2]<alpha)/nSims
  }


# combine results

answer <- cbind(all_combn, equal, unequal, mann)

head(answer)

  ss1 ss2 sds equal unequal  mann
1  10  10   4 0.070   0.062 0.062
2  10  25   4 0.046   0.048 0.048
3  25  25   4 0.048   0.048 0.048
4  25  50   4 0.038   0.048 0.048
5  25 100   4 0.058   0.054 0.054
6  50  25   4 0.048   0.054 0.054
mfidino
  • 3,030
  • 1
  • 9
  • 13
  • Thanks a lot, even though I am not very understand , but really thanks for yours comment..i just find out that i am not really very understand for the part of nested for loop – j.l Apr 07 '16 at 04:55
  • HI, can i ask a simple question, because i just noticed this, why number of column of the matrix for storing the p-values set to the number of simulation instead of the number of calculations? I am wondering for this part.... – j.l Apr 07 '16 at 12:14
  • For every unique combination of your factors (e.g. `sample_sizes` and `sds`) you will want to do a number of simulations. To test for type 1 error rate you will need to see how many of those simulations had p values less than your specified alpha. As such, you will need to store each p-value from each simulation. The `store_sim` matrix has 3 columns (one for each test) and `nSims` rows (one for each simulation). – mfidino Apr 07 '16 at 21:35
  • alright, i get the brief idea already... for the last part of equal[ss]<- sum(store_sim[,1] – j.l Apr 08 '16 at 15:00
  • Yes. That part is there to compute the type 1 error rate. The first loop takes a row from `all_combn` and then we make `nSims` simulations with that row, storing them in `store_sim`. After those simulations we compute the error rate with the rows you mentioned and store those into a vector. Once that is done the next row of data is taken from `all_combn` and we repeat the simulations until the error rate has been calculated for each row in `all_combn`. – mfidino Apr 08 '16 at 15:25
  • Morning, is that any possible ways in separating the results so that it will be more neatly....for example after one set of sample sizes then with all the set of standard deviation ratios then spacing one row...."\n" can i use this? – j.l Apr 10 '16 at 01:36
  • Really appreciated yours helping hands :) thanks a lot – j.l Apr 10 '16 at 01:47
  • There sure are, but those types of questions have already been answered on the site. See http://stackoverflow.com/questions/1296646/how-to-sort-a-dataframe-by-columns for how to order the data.frame by multiple columns and http://stackoverflow.com/questions/16249903/add-a-new-row-in-specific-place-in-a-dataframe for how to add rows. – mfidino Apr 11 '16 at 14:32
  • i tried to apply yours suggestion in gama distribution,but the for loop didnt iterate as expected... – j.l Apr 18 '16 at 04:05