0

I am struggling to find the proper way to select a specific row in a dataframe, based on several conditions from different columns and to try to input a given value to that cell. I am trying to find a dplyr based solution, but can’t find any after several hours of trying…

I have created an empty column ‘C’, I want to select for a given ‘A’ (factor) and for the maximum value of ‘B’ (numeric) for that given A (not the max for the entire col B), then I want to write ‘something’ in the corresponding ‘C’ column for that row.

Here is a dummy DF replicating what I am trying to do:

A <- factor(c("19/09/2022", "19/09/2022", "19/09/2022", "20/09/2022", "20/09/2022", "20/09/2022", "21/09/2022", "21/09/2022", "21/09/2022", "22/09/2022", "22/09/2022", "22/09/2022"))

B <- c(0.1781223, 3.3488114,  4.1476595,  5.8611553, 10.9773307, 16.9890155, 24.0428161, 35.1776457, 40.4551331, 49.5663783, 63.9132875, 64.6766946)

df <- data.frame(A, B)

I create an empty column, in which I want to write.

df$C <- NA

I have tried something like this, using case_when, but there is an error on the type of B:


df %>%  
  filter(A == "19/09/2022") %>% 
  mutate(C = case_when(
    B == max(B, na.rm = T) ~ "something",
    B ~ NA))

Thank you for your help!

stefan
  • 90,330
  • 6
  • 25
  • 51
Kent0603
  • 15
  • 3
  • 1
    (1) You don't need to pre-instantiate `df$C`, it will be created in the `mutate` expression. (2) Floating-point equality is not a safe thing, see https://stackoverflow.com/q/9508518/3358272 and [R FAQ 7.31](https://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-doesn_0027t-R-think-these-numbers-are-equal_003f). Instead, use a tolerance on the absolute-value difference between them, such as `abs(max - max(B,na.rm=TRUE)) < 1e-9` or similar. – r2evans Aug 08 '23 at 16:50
  • (3) *"there is an error"*, be explicit and include the text of the error in your question. In this case, it likely would have given you a stream of resolving-comments a little sooner (not that an answer inside of 50 minutes is bad, mind you). (4) `B ~ NA` is very unclear; while R does cast a floating-point to logical (anything not `0` is `TRUE`), it is a little sloppy to _rely_ on this behavior. And if you are not actually desiring that specific logic, then ... something else is wrong with that expression. – r2evans Aug 08 '23 at 16:51

1 Answers1

0

The issue is that in case_when the LHS should be a logical vector whereas in B ~ NA it is a double (=B). Perhaps you want case_when(B == max(B, na.rm = T) ~ "something", .default = NA).

Moreover, as you want to assign a value to C based on a condition by A, you have to group your data using a group_by or the .by argument of mutate. Also, as you are only checking one condition you could go for an if_else:

library(dplyr)

df %>%
  mutate(
    C = if_else(
      A == "19/09/2022" & B == max(B, na.rm = T),
      "something",
      NA
    ),
    .by = A
  )
#>             A          B         C
#> 1  19/09/2022  0.1781223      <NA>
#> 2  19/09/2022  3.3488114      <NA>
#> 3  19/09/2022  4.1476595 something
#> 4  20/09/2022  5.8611553      <NA>
#> 5  20/09/2022 10.9773307      <NA>
#> 6  20/09/2022 16.9890155      <NA>
#> 7  21/09/2022 24.0428161      <NA>
#> 8  21/09/2022 35.1776457      <NA>
#> 9  21/09/2022 40.4551331      <NA>
#> 10 22/09/2022 49.5663783      <NA>
#> 11 22/09/2022 63.9132875      <NA>
#> 12 22/09/2022 64.6766946      <NA>
stefan
  • 90,330
  • 6
  • 25
  • 51