-2

Is there a way to create relative numbers based on multiple factors without creating multiple subsets of an R data frame?

E.g. in mtcars I want to have the 'hp' relative to the mean hp per 'am' and 'gear'.

I could subset mtcars by am and subsequently gear and create a new column with hp relative to the mean and then rbind all the subsets together again. However, I think this can be done in a more elegant and easier way. Maybe some plyr, but I have not found a solution.

Almstrup
  • 99
  • 1
  • 4

1 Answers1

0

Not exactly sure what you are looking for. You could use dplyr:

mtcars %>%
  group_by(am, gear) %>%
  mutate(hp_group = mean(hp))

which returns

     mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb hp_group
   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>    <dbl>
 1  21       6  160    110  3.9   2.62  16.5     0     1     4     4     83.9
 2  21       6  160    110  3.9   2.88  17.0     0     1     4     4     83.9
 3  22.8     4  108     93  3.85  2.32  18.6     1     1     4     1     83.9
 4  21.4     6  258    110  3.08  3.22  19.4     1     0     3     1    176. 
 5  18.7     8  360    175  3.15  3.44  17.0     0     0     3     2    176. 
 6  18.1     6  225    105  2.76  3.46  20.2     1     0     3     1    176. 
 7  14.3     8  360    245  3.21  3.57  15.8     0     0     3     4    176. 
 8  24.4     4  147.    62  3.69  3.19  20       1     0     4     2    101. 
 9  22.8     4  141.    95  3.92  3.15  22.9     1     0     4     2    101. 
10  19.2     6  168.   123  3.92  3.44  18.3     1     0     4     4    101. 

The column hp_group is the mean of hp for all cars sharing the same am and gear.

Martin Gal
  • 16,640
  • 5
  • 21
  • 39
  • Thanks. Nearly perfect. I was looking for the hp/hp_group (the relative hp compared to the mean of the group), but that should be easy to add. – Almstrup Aug 04 '20 at 07:35
  • I found a difference whether or not plyr also was loaded. The above code works with dplyr. With plyr also loaded the hp_group has the same value in all rows. Use detach("package:plyr",unload = TRUE) to use the dplyr – Almstrup Aug 04 '20 at 09:55