0

I'd like to create a dataframe that subsets the row with the the largest value in the height column for each marker by sample identification information which includes Sample_Type and Concentration. I've pasted a sample dataframe below. The final df in this example should contain rows 2-4.

structure(list(Marker = c("A", "A", "B", "B", "B", "B", "C", 
"A", "A", "A"), Height = c(40L, 61L, 38L, 33L, 49L, 114L, 152L, 
108L, 108L, 50L), Sample_Type = c("NTC", "NTC", "NTC", "NTC", 
"NTC", "NTC", "NTC", "CEPH", "CEPH", "CEPH"), Concentration = c(100L, 
100L, 100L, 100L, 100L, 100L, 100L, 100L, 50L, 50L), PCR_Protocol = 
c("Current_PCR", 
"Current_PCR", "Current_PCR", "Current_PCR", "Current_PCR", "Current_PCR", 
"Current_PCR", "Current_PCR", "Current_PCR", "Current_PCR")), class = 
"data.frame", row.names = c(NA, 
-10L))

Thank you!

cbaudo
  • 15
  • 3
  • Possible duplicate of [How to select the row with the maximum value in each group](https://stackoverflow.com/questions/24558328/how-to-select-the-row-with-the-maximum-value-in-each-group) – IceCreamToucan Dec 18 '18 at 21:44

1 Answers1

0

Using dplyr, filter on max:

library(dplyr)

df1 %>% 
  group_by(Marker) %>% 
  filter(max(Height) == Height)
# # A tibble: 3 x 6
# # Groups:   Marker [3]
#   Marker  Size Height Sample_Type Concentration PCR_Protocol
#   <chr>  <dbl>  <int> <chr>               <int> <chr>       
# 1 A       79.2     61 NTC                   100 Current_PCR 
# 2 B       84.2     38 NTC                   100 Current_PCR 
# 3 C       99.7     33 NTC                   100 Current_PCR 
zx8754
  • 52,746
  • 12
  • 114
  • 209
  • Thanks for your response zx8754. There are several other Sample_Types and Concentrations. This result currently produces the highest value across all sample times. Sorry if I wasn't clear, but I'm interested in retrieving the highest value for that marker within each specific sample within each particular concentration. – cbaudo Dec 18 '18 at 21:55
  • 1
    @cbaudo then please provide representative data, and expected output. – zx8754 Dec 18 '18 at 21:56
  • I've edited the sample data. The output should be rows 3, 7, 8, 9, and 10 – cbaudo Dec 18 '18 at 22:11
  • @cbaudo then add them into group_by: `group_by(Marker, Sample_Type, Concentration)` – zx8754 Dec 18 '18 at 22:22