0

I'm pretty new to statistics and I need your help. I just installed the R software and I have no idea how to work with it. I have a small sample looking as follows:

Group A : 10, 12, 14, 19, 20, 23, 34, 41, 12, 13
Group B :  8, 12, 14, 15, 15, 16, 21, 36, 14, 19

I want to apply t-test but before that I would like to apply Shapiro test to know whether my sample comes from a population which has a normal distribution. I know there is a function shapiro.test() but how can I give my numbers as an input to this function?

Can I simply enter shapiro.test(10,12,14,19,20,23,34,41,12,13, 8,12, 14,15,15,16,21,36,14,19)?

Bahador Saket
  • 995
  • 1
  • 12
  • 24
  • 1
    I know you probably want to get this working _NOW_ but it sounds like you might also find a basic "how to use R" tutorial useful. That will help you learn how to use the help functions, and then you can start working this out for yourself. I find the reference cards very useful: try http://cran.r-project.org/doc/contrib/Short-refcard.pdf, or http://stackoverflow.com/questions/1744861/how-to-learn-r-as-a-programming-language/1744882#1744882. Basically, if you type help(shapiro.test) you'll get some examples you can try. – Andy Clifton Aug 10 '14 at 00:54

1 Answers1

3

OK, because I'm feeling nice, let's work through this. I am assuming you know how to run commands, etc. First up, put your data into vector:

A = c(10, 12, 14, 19, 20, 23, 34, 41, 12, 13)
B = c(8, 12, 14, 15, 15, 16, 21, 36, 14, 19)

Let's check the help for shapiro.test().

help(shapiro.test)

In there you'll see the following:

Usage

shapiro.test(x)

Arguments

x a numeric vector of data values. Missing values are allowed, but the number of non-missing values must be between 3 and 5000.

So, the inputs need to be a vector values. Now we know that we can run the 'shapiro.test()' function directly with our vectors, A and B. R uses named arguments for most of its functions, and so we tell the function what we are passing in:

shapiro.test(x = A)

and the result is put to the screen:

Shapiro-Wilk normality test

data:  A
W = 0.8429, p-value = 0.0478

then we can do the same for B:

shapiro.test(x = B)

which gives us

Shapiro-Wilk normality test

data:  B
W = 0.8051, p-value = 0.0167

If we want, we can test A and B together, although it's hard to know if this is a valid test or not. By 'valid', I mean imagine that you are pulling numbers out of a bag to get A and B. If the numbers in A get thrown back in the bag, and then we take B, we've just double counted. If the numbers in A didn't get thrown back in, testing x =c(A,B) is reasonable because all we've done is increased the size of our sample.

shapiro.test(x = c(A,B))

Do these mean that the data are normally distributed? Well, in the help we see this:

Value

...

p.value an approximate p-value for the test. This is said in Royston (1995) to be adequate for p.value < 0.1

So maybe that's good enough. But it depends on your requirements!

Andy Clifton
  • 4,926
  • 3
  • 35
  • 47
  • Thanks Andy, I have another quick question. In order to able to apply t-test, data in both groups A and B should be normal sample space?In other words, result of the shapiro.test(A) , and Shapiro.test(B) should be > 0.05? – Bahador Saket Aug 10 '14 at 01:11
  • 1
    @BahadorSaket See edit. You need to decide for yourself if this is good enough. – Andy Clifton Aug 10 '14 at 01:18