-2

I'm looking for a way to produce descriptive statistics by group number in R. There is another answer on here I found, which uses dplyr, but I'm having too many problems with it and would like to see what alternatives others might recommend.

I'm looking to obtain descriptive statistics on revenue grouped by group_id. Let's say I have a data frame called company:

group_id    company     revenue
1          Company A    200
1          Company B    150
1          Company C    300
2          Company D    600
2          Company E    800
2          Company F    1000
3          Company G    50
3          Company H    80
3          Company H    60

and I'd like to product a new data frame called new_company:

group_id    company revenue average min max   SD
1          Company A    200  217    150 300   62
1          Company B    150  217    150 300   62
1          Company C    300  217    150 300   62
2          Company D    600  800    600 1000  163
2          Company E    800  800    600 1000  163
2          Company F    1000 800    600 1000  163
3          Company G    50  63      50  80    12
3          Company H    80  63      50  80    12
3          Company H    60  63      50  80   12

Again, I'm looking for alternatives to dplyr. Thank you

Community
  • 1
  • 1
BlueDevilPride
  • 147
  • 5
  • 13

2 Answers2

2

Using the sample data frame

dd<-read.csv(text="group_id,company,revenue
1,Company A,200
1,Company B,150
1,Company C,300
2,Company D,600
2,Company E,800
2,Company F,1000
3,Company G,50
3,Company H,80
3,Company H,60", header=T)

You could do something fancy like use ave() to create all the values per row for your different functions and then just combine that with the original data.frame.

ext <- with(dd, Map(function(x) ave(revenue, group_id, FUN=x), 
    list(avg=mean, min=min, max=max, SD=sd)))
cbind(dd, ext)
#   group_id   company revenue       avg min  max        SD
# 1        1 Company A     200 216.66667 150  300  76.37626
# 2        1 Company B     150 216.66667 150  300  76.37626
# 3        1 Company C     300 216.66667 150  300  76.37626
# 4        2 Company D     600 800.00000 600 1000 200.00000
# 5        2 Company E     800 800.00000 600 1000 200.00000
# 6        2 Company F    1000 800.00000 600 1000 200.00000
# 7        3 Company G      50  63.33333  50   80  15.27525
# 8        3 Company H      80  63.33333  50   80  15.27525
# 9        3 Company H      60  63.33333  50   80  15.27525

but really a simple dplyr command would be easier.

dd %>% group_by(group_id) %>% 
  mutate(
    avg=mean(revenue), 
    min=min(revenue), 
    max=max(revenue), 
    SD=sd(revenue))
MrFlick
  • 195,160
  • 17
  • 277
  • 295
  • thanks, Mr. Flick. the last solution works and I just figured out that I had plyr was installed, which was causing it to miss group_by command all together. Thanks again -- all good to go. – BlueDevilPride Dec 12 '16 at 21:52
1

Another function I like to use is: describeBy from package "psych".

library(psych)    
description   <- describeBy(data.frame$variable_to_be_described, df$group_variable)
George GL
  • 29
  • 3