1

I am working with data that is collected via multiple monitors in one county, everyday. I want to retain the maximum concentration for all monitors in one county for each monitored day. First, I need to identify the rows that have reported values for the same county on the same day, then remove the rows with the lowest value of the reported values. Here is a sample data frame (p2015):

County.Code  Site.Num  Date.Local  Max.Value 
3            3         1/1/2015    30
3            3         1/1/2015    45
3            3         1/1/2015    16
4            5         1/1/2015    32
4            5         1/1/2015    18
5            6         1/1/2015    8
5            7         1/1/2015    11

I want to be left with this:

County.Code  Site.Num  Date.Local  Max.Value
3            3         1/1/2015    45
4            5         1/1/2015    32
5            7         1/1/2015    11

I thought maybe an If statement using which would do the trick but it's not working. I've tried the following with no luck:

if (p2015$County.Code == p2015$Date.Local){
p2015[ , .SD[which.min(Max.Value)]]   
}

I've searched for this and was only able to find information for removing rows based on two conditions, not three. Also, I'm just plain confused on how to extract the minimum values. Any help is appreciated.

ZoeyB
  • 11
  • 4
  • data table way `data[,list(Max.value=max(Max.value)),by=list(Country.code,Date.Local)]` – rahul Oct 04 '18 at 07:18

1 Answers1

2

Overview

Group df by County.Code and Date.Local and keep only the rows where Max.Value is equal to max(Max.Value).

Code

# load necessary packages ----
library(tidyverse)

# load necessary data -----
df <-
  read.table(text = "County.Code  Site.Num  Date.Local  Max.Value 
3            3         1/1/2015    30
3            3         1/1/2015    45
3            3         1/1/2015    16
4            5         1/1/2015    32
4            5         1/1/2015    18
5            6         1/1/2015    8
5            7         1/1/2015    11"
             , header = TRUE
             , stringsAsFactors = FALSE)

# perform calculations -----
max.value.per.county.per.day <-
  df %>%
  group_by(County.Code, Date.Local) %>%
  filter(Max.Value == max(Max.Value)) %>%
  ungroup()

# view results ---
max.value.per.county.per.day
# A tibble: 3 x 4
# County.Code Site.Num Date.Local Max.Value
#         <int>    <int> <chr>          <int>
# 1           3        3 1/1/2015          45
# 2           4        5 1/1/2015          32
# 3           5        7 1/1/2015          11

# end of script #
Cristian E. Nuno
  • 2,822
  • 2
  • 19
  • 33
  • 1
    Worked like a charm. Many thanks! – ZoeyB Oct 04 '18 at 13:54
  • @ZoeyB, if there are multiple `Site.Num` within each `County.Code`-`Date.Local` pair that are equal to `max(Max.Value)`, you can grab the first record using `slice(1)` (i.e. `df %>% groupby() %>% filter() %>% slice(1)`. – Cristian E. Nuno Oct 05 '18 at 16:43