0

I have a dataframe that includes a column 'names' with multiple instances of a given name, against which another column, 'time' has various times. I'm trying to convert this raw data into another dataframe that lists only unique values for the names, and gives minimum and maximum values for times in separate columns, as well as a count of instances of each name from the original dataframe. I want the final dataframe to be sorted by min.time, smallest to largest.

For example:

data example

So far I have got the unique list of names and the count of each to work, but I can't figure out how to find the minimum and maximum time value for each unique name. Any suggestions would be much appreciated.

AnilGoyal
  • 25,297
  • 4
  • 27
  • 45
Phenomniverse
  • 309
  • 1
  • 8
  • 2
    It would be easier to help if you create an reproducible example along with expected output. Read about how to give a reproducible example https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example. – Peace Wang Sep 21 '21 at 00:49
  • `aggregate(time~name, df, \(x)c(length(x), range(x)))` – Onyambu Sep 21 '21 at 02:39
  • Please provide enough code so others can better understand or reproduce the problem. – Community Sep 25 '21 at 20:41

4 Answers4

2

Using tidyverse:

library(tidyverse)

df <- 
tribble(
  ~name, ~time,
  'a', 19.642,
  'a', 19.644,
  'b', 20.178,
  'c', 22.345,
  'b', 20.183,
  'a', 19.646,
  'b', 20.190,
  'c', 22.332)

df %>% 
  group_by(name) %>% 
  summarise(count = n(), min.time = min(time), max.time = max(time)) %>% 
  arrange(min.time)
#> # A tibble: 3 × 4
#>   name  count min.time max.time
#>   <chr> <int>    <dbl>    <dbl>
#> 1 a         3     19.6     19.6
#> 2 b         3     20.2     20.2
#> 3 c         2     22.3     22.3

Created on 2021-09-20 by the reprex package (v2.0.0)

jpdugo17
  • 6,816
  • 2
  • 11
  • 23
  • 1
    Thanks, I used your solution with jared_mamrot's answer to guide me and some changes of my own to produce the final result I was after. See below, and thanks for your help. – Phenomniverse Sep 21 '21 at 21:28
2

use dplyr package dataset %>% group_by(name) %>% summarize(n= n(), min = min(time), max = max(time))

yuliaUU
  • 1,581
  • 2
  • 12
  • 33
2

Edit

Based on your comment below, this is a potential solution:

library(tidyverse)
df <- data.frame(name = c("a", "a", "b", "c", "b", "a", "b", "c"),
                 time = c(19.642, 19.644, 20.178, 22.345,
                          20.183, 19.646, 20.190, 22.332))
df2 <- df %>%
  group_by(name) %>%
  summarise(n = n(),
            min = min(time),
            max = max(time))
print.data.frame(df2)
#>   name n    min    max
#> 1    a 3 19.642 19.646
#> 2    b 3 20.178 20.190
#> 3    c 2 22.332 22.345

df3 <- df2 %>%
  arrange(desc(min)) %>%
  mutate(`max - min` = max - min)
print.data.frame(df3)
#>   name n    min    max max - min
#> 1    c 2 22.332 22.345     0.013
#> 2    b 3 20.178 20.190     0.012
#> 3    a 3 19.642 19.646     0.004

Created on 2021-09-21 by the reprex package (v2.0.1)

jared_mamrot
  • 22,354
  • 4
  • 21
  • 46
1

This is the solution that I ended up using, which draws on jpdugo17 and jared_mamrot's answers above.

library(readr)
library(tidyverse)

#Objective is to read raw data in format "compound, elution time, sample,
#datafile, ARL number, notes" and output a new data file in format "unique
#compound list, count, min elution time, max elution time, elution span", sorted 
#from lowest to highest min elution time, and excluding non-blank 'notes' fields.

raw_data<-read_csv("GC3_raw_data.csv") #reads raw data in from .csv file
df_in<-data.frame(raw_data)  #turns raw data into a data.frame
df_in<-df_in[is.na(df_in$Notes),] #removes entries that have notes
df_out <- df_in %>%
  group_by(Compound)%>%
  summarise(n=n(),
            min.elution.time = min(Elution.time),
            max.elution.time = max(Elution.time)) #creates a new dataframe with unique compound list and min/max elution times

df_out$elution.span <- df_out$max.elution.time - df_out$min.elution.time  #adds the elution span column to the new dataframe

df_out <- df_out[order(df_out$min.elution.time),] #orders the new dataframe by min.elution.time, from smallest to largest

print.data.frame(df_out)

write.csv(df_out,"GC3_clean_data.csv")

Created on 2021-09-22 by the reprex package (v2.0.1)

Phenomniverse
  • 309
  • 1
  • 8