0

I have a data set:

data

and wish to check assumptions. I would like to check normal distribution and deviation, and wish to make subgroups. For example, I would like to check the distribution of my 100 group for each of the variables listed (OM,redox,etc.). Is there a way I can make a "100" group for these variables and then test them??

Thank you for your help!

Zheyuan Li
  • 71,365
  • 17
  • 180
  • 248
morinajc
  • 15
  • 5
  • 1
    Welcome to StackOverflow. Please read (1) [how do I ask a good question](http://stackoverflow.com/help/how-to-ask), (2) [How to create a MCVE](http://stackoverflow.com/help/mcve) as well as (3) [how to provide a minimal reproducible example in R](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example#answer-5963610). Then edit and improve your question accordingly. I.e., provide input data (e.g. by adding the result of `dput(mydata`) to your post, not by posting screenshots), the expected output, what lines of code you tried and in what way they failed. – lukeA May 16 '16 at 17:27

1 Answers1

0

You can apply functions to subgroups using tapply:

set.seed(10)

df <- data.frame(id = 100:104,
                 redox = rnorm(25,mean = 20,sd = 10),
                 depth = runif(25,min = 10,max = 30))

tapply(df$redox,df$id,sd)

Which results in

> tapply(df$redox,df$id,sd)
100       101       102       103       104 
6.181492 11.067056  4.863818 14.269076  7.962710 

If you want to run a test on multiple columns simultaneously, use aggregate:

aggregate(df[,2:3],by = list(df$id),sd)

Which gives:

  Group.1     redox    depth
1     100  6.181492 6.319090
2     101 11.067056 5.869627
3     102  4.863818 2.808336
4     103 14.269076 3.438697
5     104  7.962710 6.296606

To test for normality, you can use shapiro.test:

aggregate(df[,2:3],by = list(df$id),function(x) shapiro.test(x)$statistic)
shrgm
  • 1,315
  • 1
  • 10
  • 20