How to make subgroups out of a data matrix with continuous and categorical data in R

Question

I have a data set:

and wish to check assumptions. I would like to check normal distribution and deviation, and wish to make subgroups. For example, I would like to check the distribution of my 100 group for each of the variables listed (OM,redox,etc.). Is there a way I can make a "100" group for these variables and then test them??

Thank you for your help!

Welcome to StackOverflow. Please read (1) [how do I ask a good question](http://stackoverflow.com/help/how-to-ask), (2) [How to create a MCVE](http://stackoverflow.com/help/mcve) as well as (3) [how to provide a minimal reproducible example in R](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example#answer-5963610). Then edit and improve your question accordingly. I.e., provide input data (e.g. by adding the result of `dput(mydata`) to your post, not by posting screenshots), the expected output, what lines of code you tried and in what way they failed. — lukeA, May 16 '16 at 17:27

shrgm · Answer 1 · 2016-05-16T17:45:54.490

You can apply functions to subgroups using tapply:

set.seed(10)

df <- data.frame(id = 100:104,
                 redox = rnorm(25,mean = 20,sd = 10),
                 depth = runif(25,min = 10,max = 30))

tapply(df$redox,df$id,sd)

Which results in

> tapply(df$redox,df$id,sd)
100       101       102       103       104 
6.181492 11.067056  4.863818 14.269076  7.962710

If you want to run a test on multiple columns simultaneously, use aggregate:

aggregate(df[,2:3],by = list(df$id),sd)

Which gives:

  Group.1     redox    depth
1     100  6.181492 6.319090
2     101 11.067056 5.869627
3     102  4.863818 2.808336
4     103 14.269076 3.438697
5     104  7.962710 6.296606

To test for normality, you can use shapiro.test:

aggregate(df[,2:3],by = list(df$id),function(x) shapiro.test(x)$statistic)

How to make subgroups out of a data matrix with continuous and categorical data in R

1 Answers1