1

I have two sets of data frames, the first set is like:

df1 = subset(mtcars, select="gear")

head(df1)

                  gear
Mazda RX4            4
Mazda RX4 Wag        4
Datsun 710           4
Hornet 4 Drive       3
Hornet Sportabout    3

The format of the second set is like this:

df2 = t(mtcars)[1:4,]

head(df2)

     Mazda RX4 Mazda RX4 Wag Datsun 710 Hornet 4 Drive Hornet Sportabout 
mpg         21            21       22.8           21.4              18.7    
cyl          6             6        4.0            6.0               8.0     
disp       160           160      108.0          258.0             360.0   
hp         110           110       93.0          110.0             175.0   

The row names of df1 are the column names of df2.

I would like to take all the cars that have the same gear in df1 as a group. Then calculate the mean of mpg and disp together only for the cars, followed by sorting the groups based on their means from high to low.

In this case, the expected results would be (since the cars with gear 3 have a higher mean of mpg and disp than the ones with gear 4):

        Hornet 4 Drive Hornet Sportabout Mazda RX4 Mazda RX4 Wag Datsun 710
mpg             21.4             18.7          21            21       22.8    
cyl              6.0              8.0           6             6        4.0                 
disp           258.0            360.0         160           160      108.0  
hp             110.0            175.0         110           110       93.0  

I hope this is clear to you. I do not know how to apply the groups (gear) of df1 to df2 and order df2 based on the mean of the groups. Thanks!

kin182
  • 393
  • 6
  • 13
  • I'm confused by your expected output - don't you want your output to have columns that are `gear = 3`, `gear = 4`... etc since you want to treat each `gear` as a group? – Mike H. May 10 '17 at 18:00
  • The original `df2` does not have `gear` in it. I see what you mean, it would be good to append a new row of `gear` in `df2` based on the data of `df1` and sort them as groups? – kin182 May 10 '17 at 18:06
  • I dont full understand but suggest use something like `reshape2::melt(df2)` to make in in a similar format as df1. then `dplyr::bind_rows` (or `rbind`) to make it one tall dataframe. Now you can use `dplyr::group_by` and `summarize`. You might need `reshape2:dcast` to put things together, – Andrew Lavers May 10 '17 at 18:48

1 Answers1

1

First I would transpose again df2 and merge it with df1by their row.names.

tdf2 <- t(df2)
dfmerge <- merge(df1, tdf2, by = "row.names")

Then use dplyrto calculate mean + disp grouped by gear, and also for arranging cars in descending order, from the highest mean to the lowest.

m.mpg.disp <- dfmerge %>% group_by(gear)  %>% 
              mutate(mean.mpg.disp = mean(mpg + disp)) %>% 
              arrange(desc(mean.mpg.disp))

For finally transposing the data again.

finaldf <- as.data.frame(t(m.mpg.disp))

To get the data frame exactly as yours, you could use the first row of finaldf as variable names, and delete the rows corresponding to gear and mean.mpg.disp (the mean of the sum of mean + disp).

names(finaldf) <- as.matrix(finaldf[1, ])
finaldf <- finaldf[-c(1,2,7), ]
finaldf[] <- lapply(finaldf, function(x) type.convert(as.character(x)))
finaldf

The final step I borrowed it from here

The result (just showing first three columns):

      AMC Javelin Cadillac Fleetwood Camaro Z28
 mpg         15.2               10.4       13.3
 cyl          8.0                8.0        8.0
 disp       304.0              472.0      350.0
 hp         150.0              205.0      245.0
Community
  • 1
  • 1
csmontt
  • 614
  • 8
  • 15