-5

Assuming I have a dataframe in the following format:

 Group Setting Runtime Memory SomeOtherColumns
 A     X       102     105    ...
 A     X       107     80     ...
 A     Y       100     104    ...
 A     Y       101     82     ...
 B     X       10      50     ...
 B     X       11      51     ...
 B     X       8       52     ...
 B     Y       13      60     ...
 B     Y       14      61     ...
 B     Y       15      62     ...
 C     X       5       6      ...
 C     Y       6       7      ...

I would like to extract one row per Group+Setting, i.e., one row for A+X, A+Y, B+X, B+Y, C+X, and C+Y. The extracted row should be the one with the lowest Runtime value for the given group.

Following the expected result:

 Group Setting Runtime Memory SomeOtherColumns ...
 A     X       102     105    ...
 A     Y       100     104    ...
 B     X       8       52     ...
 B     Y       13      60     ...
 C     X       5       6      ...
 C     Y       6       7      ...
hpesoj626
  • 3,529
  • 1
  • 17
  • 25
Markus Weninger
  • 11,931
  • 7
  • 64
  • 137
  • The question that has been marked as duplicate states in its first sentence `I wish to (1) group data by one variable (State)`, yet I wanted to group by multiple variables. But the answer of @docendo discimus works fine. – Markus Weninger Apr 19 '18 at 11:55

1 Answers1

1

Using dplyr this would be:

library(dplyr)
df %>% group_by(Group, Setting) %>% slice(which.min(Runtime))
# # A tibble: 6 x 5
# # Groups:   Group, Setting [6]
# Group Setting Runtime Memory SomeOtherColumns
#  <fct> <fct>     <int>  <int> <fct>           
# 1 A     X           102    105 ...             
# 2 A     Y           100    104 ...             
# 3 B     X             8     52 ...             
# 4 B     Y            13     60 ...             
# 5 C     X             5      6 ...             
# 6 C     Y             6      7 ...   

Similarly, in data.table parlance:

library(data.table)
setDT(df)
df[, .SD[which.min(Runtime)], by = .(Group, Setting)]

or using the order:

unique(df[order(Runtime)], by = c("Group", "Setting"))
talat
  • 68,970
  • 21
  • 126
  • 157