How to list min and max values for multiple values of a given name in a data frame in R?

Question

I have a dataframe that includes a column 'names' with multiple instances of a given name, against which another column, 'time' has various times. I'm trying to convert this raw data into another dataframe that lists only unique values for the names, and gives minimum and maximum values for times in separate columns, as well as a count of instances of each name from the original dataframe. I want the final dataframe to be sorted by min.time, smallest to largest.

For example:

data example

So far I have got the unique list of names and the count of each to work, but I can't figure out how to find the minimum and maximum time value for each unique name. Any suggestions would be much appreciated.

It would be easier to help if you create an reproducible example along with expected output. Read about how to give a reproducible example https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example. — Peace Wang, Sep 21 '21 at 00:49
Please provide enough code so others can better understand or reproduce the problem. — Community, Sep 25 '21 at 20:41

score 2 · Answer 1 · answered Sep 21 '21 at 00:59

Using tidyverse:

library(tidyverse)

df <- 
tribble(
  ~name, ~time,
  'a', 19.642,
  'a', 19.644,
  'b', 20.178,
  'c', 22.345,
  'b', 20.183,
  'a', 19.646,
  'b', 20.190,
  'c', 22.332)

df %>% 
  group_by(name) %>% 
  summarise(count = n(), min.time = min(time), max.time = max(time)) %>% 
  arrange(min.time)
#> # A tibble: 3 × 4
#>   name  count min.time max.time
#>   <chr> <int>    <dbl>    <dbl>
#> 1 a         3     19.6     19.6
#> 2 b         3     20.2     20.2
#> 3 c         2     22.3     22.3

^{Created on 2021-09-20 by the reprex package (v2.0.0)}

Thanks, I used your solution with jared_mamrot's answer to guide me and some changes of my own to produce the final result I was after. See below, and thanks for your help. — Phenomniverse, Sep 21 '21 at 21:28

score 2 · Answer 2 · answered Sep 21 '21 at 00:59

2

use dplyr package dataset %>% group_by(name) %>% summarize(n= n(), min = min(time), max = max(time))

answered Sep 21 '21 at 00:59

yuliaUU

1,581
2
12
33

jared_mamrot · Answer 3 · 2021-09-21T04:24:20.510

2

Edit

Based on your comment below, this is a potential solution:

library(tidyverse)
df <- data.frame(name = c("a", "a", "b", "c", "b", "a", "b", "c"),
                 time = c(19.642, 19.644, 20.178, 22.345,
                          20.183, 19.646, 20.190, 22.332))
df2 <- df %>%
  group_by(name) %>%
  summarise(n = n(),
            min = min(time),
            max = max(time))
print.data.frame(df2)
#>   name n    min    max
#> 1    a 3 19.642 19.646
#> 2    b 3 20.178 20.190
#> 3    c 2 22.332 22.345

df3 <- df2 %>%
  arrange(desc(min)) %>%
  mutate(`max - min` = max - min)
print.data.frame(df3)
#>   name n    min    max max - min
#> 1    c 2 22.332 22.345     0.013
#> 2    b 3 20.178 20.190     0.012
#> 3    a 3 19.642 19.646     0.004

^{Created on 2021-09-21 by the reprex package (v2.0.1)}

edited Sep 21 '21 at 04:24

answered Sep 21 '21 at 01:05

jared_mamrot

22,354
4
21
46

Thanks, this is good except that it orders the output by name, not by minimum time. Also, if I want to add a fourth column to df2 which is max-min how would I do that? – Phenomniverse Sep 21 '21 at 03:18
Edited my answer to provide a potential solution. – jared_mamrot Sep 21 '21 at 06:34
Thanks, I ended up coming up with my own solution for adding the extra column and ordering by min time. See below. Thanks for your help. – Phenomniverse Sep 21 '21 at 21:26

score 1 · Answer 4 · answered Sep 21 '21 at 21:24

This is the solution that I ended up using, which draws on jpdugo17 and jared_mamrot's answers above.

library(readr)
library(tidyverse)

#Objective is to read raw data in format "compound, elution time, sample,
#datafile, ARL number, notes" and output a new data file in format "unique
#compound list, count, min elution time, max elution time, elution span", sorted 
#from lowest to highest min elution time, and excluding non-blank 'notes' fields.

raw_data<-read_csv("GC3_raw_data.csv") #reads raw data in from .csv file
df_in<-data.frame(raw_data)  #turns raw data into a data.frame
df_in<-df_in[is.na(df_in$Notes),] #removes entries that have notes
df_out <- df_in %>%
  group_by(Compound)%>%
  summarise(n=n(),
            min.elution.time = min(Elution.time),
            max.elution.time = max(Elution.time)) #creates a new dataframe with unique compound list and min/max elution times

df_out$elution.span <- df_out$max.elution.time - df_out$min.elution.time  #adds the elution span column to the new dataframe

df_out <- df_out[order(df_out$min.elution.time),] #orders the new dataframe by min.elution.time, from smallest to largest

print.data.frame(df_out)

write.csv(df_out,"GC3_clean_data.csv")

^{Created on 2021-09-22 by the reprex package (v2.0.1)}

How to list min and max values for multiple values of a given name in a data frame in R?

4 Answers4

Edit