0

In the below dataframe, is it possible to find the Date at which maximum occured by groups

df
Date        Var    Value
27/9/2019    A       56
28/9/2019    A       50
1/10/2019    B       90
2/10/2019    B       100

df1      Max         Date          Mean
A        56        27/9/2019        53
B        100       2/10/2019        95
Dev P
  • 449
  • 3
  • 12
  • I tried with ````df1 <- df %>% group_by(Var) %>% summarise(Date = df$Date[which.max(df$Value)])```` But could not get – Dev P Oct 13 '19 at 03:17

3 Answers3

2

We can group_by Var, calculate the mean of Value and select the row with maximum value.

library(dplyr)

df %>%
  group_by(Var) %>%
  mutate(Mean = mean(Value)) %>%
  slice(which.max(Value))

#  Date      Var   Value  Mean
#  <fct>     <fct> <int> <dbl>
#1 27/9/2019 A        56    53
#2 2/10/2019 B       100    95
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • This is efficient and perhaps qualifying the `mutate` as `dplyr::mutate(Mean = mean(Value))` might make it explicit. +1. – deepseefan Oct 13 '19 at 05:31
0

There might be a better way to do this. Since you want a summary which reduce multiple values down to a single value. You could *_join the output of the filter with the summary table as follows:

library(dplyr)
df1 <- df %>%
  group_by(Var) %>%
  filter(Value == max(Value)) %>%
  select(df1=Var, Max=Value, Date)

df2 <-df %>%
  group_by(Var) %>%
  summarise_at(.vars = vars(Value),
               .funs = c(mean="mean", sd="sd")) 
df2 %>%
  left_join(df1, by = "Var") %>%
  select(Var, Value, Date, mean, sd)

# -------------------------------------------------------------------------

# # A tibble: 2 x 5
#   Var   Value Date       mean    sd
#   <chr> <dbl> <chr>     <dbl> <dbl>
# 1 A        56 27/9/2019    53  4.24
# 2 B       100 2/10/2019    95  7.07

Data

df <- data.frame(
  Date = c("27/9/2019", "28/9/2019", "1/10/2019", "2/10/2019"), 
  Var = c("A", "A", "B", "B"), 
  Value = c(56, 50, 90, 100), stringsAsFactors = F
)

Hope that is what you want.

Community
  • 1
  • 1
deepseefan
  • 3,701
  • 3
  • 18
  • 31
  • thanks. Is it not possible by summarise. Because I need to find mean, sd and for the groups as well. I was able to do that. So basically my code goes like this ````df1 <- df %>% group_by(Var) %>% summarise(Std = round(sd(Value),2),Mean = round(mean(Value),2),Max = round(max(Value),2),filter(Value == max(Value)) %>% arrange(Value,Var, Date))```` But not working – Dev P Oct 13 '19 at 03:43
  • @DevP, I edited my answer in a way I taught you might use `summarise`. – deepseefan Oct 13 '19 at 05:37
0

Base R, split-apply-combine (Edited):

# Create df, ensure date vec has appropriate type: 

df <- data.frame(

  Date = as.Date(c("27/9/2019", "28/9/2019", "1/10/2019", "2/10/2019"), "%d/%m/%y"),

  Var = c("A", "A", "B", "B"), 

  Value = c(56, 50, 90, 100),

  stringsAsFactors = F
)

# Split df by "Var" values: 

split_applied_combined <- lapply(split(df, df$Var), function(x){

# Calculate the max date: 

    max_date <- x$Date[which(x$Value == max(x$Value))]

    # Calculate the mean: 

    mean_val <- mean(x$Value)

    # Calculate the std_dev: 

    sd_val <- sd(x$Value)

   # Combine vects into df: 

    summarised_df <- data.frame(max_date, mean_val, sd_val)

    }
  )

# Combine list back into dataframe:

split_applied_combined <- do.call(rbind, 

                          # Store df name as vect:

                                  mapply(cbind,

                                         "Var" = names(split_applied_combined),

                                         split_applied_combined,

                                         SIMPLIFY = FALSE))

Dplyr alternative:

require("dplyr")

# Group by var, summarise data, store return object as a dataframe: 

summarised_df <- 

  df %>% 

  group_by(Var) %>% 

  summarise(max_date_per_group = max(Date), mean_val_per_group = mean(Value), sd_per_group = sd(Value)) %>% 

  ungroup()
hello_friend
  • 5,682
  • 1
  • 11
  • 15