-1

i have a dataframe with a large number of rows and column, that contains differents calculus of efforts and measures of 30 people doing 6 differents activities.

I would like calculate the mean of each variable for each people and each activity and summarize it in a table...

My solution in my mind is to make two loops and to proceed it, but there is not an other solution, faster, to proceed it... I discovered the packages, dplyr, tidyr, plyr and reshape2 recently, and i think i can use it to find the solution, but i don't find...

Can you help me ?

 subject id_activity activity tBodyAcc-mean()-X tBodyAcc-mean()-Y tBodyAcc-mean()-Z tGravityAcc-mean()-X tGravityAcc-mean()-Y tGravityAcc-mean()-Z tBodyAccJerk-mean()-X tBodyAccJerk-mean()-Y tBodyAccJerk-mean()-Z
1        1           1  WALKING         0.2885845      -0.020294171       -0.13290514            0.9633961           -0.1408397           0.11537494            0.07799634           0.005000803         -0.0678308080
2        1           1  WALKING         0.2784188      -0.016410568       -0.12352019            0.9665611           -0.1415513           0.10937881            0.07400671           0.005771104          0.0293766330
3        1           1  WALKING         0.2796531      -0.019467156       -0.11346169            0.9668781           -0.1420098           0.10188392            0.07363596           0.003104037         -0.0090456308
4        1           1  WALKING         0.2791739      -0.026200646       -0.12328257            0.9676152           -0.1439765           0.09985014            0.07732061           0.020057642         -0.0098647722
5        1           1  WALKING         0.2766288      -0.016569655       -0.11536185            0.9682244           -0.1487502           0.09448590            0.07344436           0.019121574          0.0167799790
6        1           1  WALKING         0.2771988      -0.010097850       -0.10513725            0.9679482           -0.1482100           0.09190972            0.07793244           0.018684046          0.0093444336
7        1           1  WALKING         0.2794539      -0.019640776       -0.11002215            0.9679295           -0.1442821           0.09314463            0.08217077          -0.017014670         -0.0157981660
8        1           1  WALKING         0.2774325      -0.030488303       -0.12536043            0.9684915           -0.1467054           0.09170816            0.07236423           0.008747856         -0.0044681354
9        1           1  WALKING         0.2772934      -0.021750698       -0.12075082            0.9684812           -0.1543740           0.08511826            0.07528437           0.030762704          0.0112119500
10       1           1  WALKING         0.2805857      -0.009960298       -0.10606516            0.9684180           -0.1563020           0.08087447            0.07636932           0.012518906          0.0030843751
11       1           1  WALKING         0.2768803      -0.012721805       -0.10343832            0.9692027           -0.1523614           0.08125808            0.07139686           0.016842441          0.0010303821
12       1           1  WALKING         0.2762282      -0.021441302       -0.10820234            0.9692533           -0.1500638           0.08293121            0.07608451          -0.002311558         -0.0076736296
13       1           1  WALKING         0.2784570      -0.020414761       -0.11273172            0.9689963           -0.1523621           0.08315080            0.07710200           0.017027167         -0.0009852394
14       1           1  WALKING         0.2771750      -0.014712802       -0.10675647            0.9690440           -0.1541413           0.08181960            0.07761238           0.019489223          0.0152076830
15       1           1  WALKING         0.2979457       0.027093908       -0.06166812            0.9448949           -0.2926233          -0.02143552            0.06665616          -0.068367084         -0.0336076010

there are 10 299 rows and 56 columns, i didn't put you all column, just a subset to see how it seems like... Sorry for my english ^^

Aurélien
  • 103
  • 3
  • 12
  • se `data.table`'s `by` or `dplyr`'s `goup_by` and `summarise` – simone Aug 09 '17 at 08:36
  • Answered [here](https://stackoverflow.com/questions/9723208/aggregate-summarize-multiple-variables-per-group-i-e-sum-mean-etc). Note that in the most recent version of `dplyr` you might want to use the `summarise_at()` instead of `summarise_each()` function. – hugot Aug 09 '17 at 09:07
  • Possible duplicate of [Aggregate / summarize multiple variables per group (i.e. sum, mean, etc)](https://stackoverflow.com/questions/9723208/aggregate-summarize-multiple-variables-per-group-i-e-sum-mean-etc) – hugot Aug 09 '17 at 09:10

1 Answers1

1

You can try function aggregate. It was designed for exactly what you're looking for.

xy <- data.frame(subj = c(1,1,1,1,2,2,2,2),
                 act = c("a", "a", "b", "b", "a", "a", "b", "b"),
                 stat1 = rnorm(8),
                 stat2 = rnorm(8),
                 stat3 = rnorm(8))

xy
aggregate(. ~ subj + act, data = xy, FUN = mean)

  subj act       stat1      stat2      stat3
1    1   a  0.10244340  0.9175242 -0.1240974
2    2   a  0.06747905 -0.3221609  0.8647476
3    1   b -0.17143146  0.9971627  0.3603535
4    2   b -1.32023632  0.6584811  0.2126244

You can also use package data.table which can usually perform operations faster than some base R solutions.

library(data.table)
setDT(xy)
xy[, lapply(.SD, mean), by = .(subj, act)]

   subj act       stat1      stat2      stat3
1:    1   a  0.10244340  0.9175242 -0.1240974
2:    1   b -0.17143146  0.9971627  0.3603535
3:    2   a  0.06747905 -0.3221609  0.8647476
4:    2   b -1.32023632  0.6584811  0.2126244
Roman Luštrik
  • 69,533
  • 24
  • 154
  • 197