0

My dataframe is like this.

  SYMBOL CANCER_TYPE TIER 
1   AFDN    BLADDER3    2 
2    AHR    BLADDER3    1  
3    AHR    BLADDER2    1  
4   AKT1    BLADDER3    1  
5   AKT1    BLADDER2    2 
6    APC    BLADDER2    2  

How to mutate this dataframe in a way that it should look like following

  SYMBOL CANCER_TYPE   Highest TIER 
1   AFDN    BLADDER3  BLADDER3    2 
2    AHR        both  BLADDER3    1   
3   AKT1        both  BLADDER3    1  
4    APC    BLADDER2  BLADDER2    2 
Darren Tsai
  • 32,117
  • 5
  • 21
  • 51
Aryh
  • 479
  • 1
  • 4
  • 16
  • `?dplyr::slice_min` – LMc Aug 01 '23 at 16:41
  • Does this answer your question? [Extract row corresponding to minimum value of a variable by group](https://stackoverflow.com/questions/24070714/extract-row-corresponding-to-minimum-value-of-a-variable-by-group) – LMc Aug 01 '23 at 16:42
  • I am trying unique(setDT(i3)[order(SYMBOL, -CANCER_TYPE)], by = "SYMBOL"). Is it correct? – Aryh Aug 01 '23 at 16:42
  • Are you trying to get the minimum value of `CANCER_TYPE` by `TIER`? – LMc Aug 01 '23 at 16:46
  • Maybe `df %>% slice_max(CANCER_TYPE, by = SYMBOL)` – Andre Wildberg Aug 01 '23 at 16:48
  • 1
    The addition of the `Highest` column suggests to me that you are summarizing, not filtering. However, in your input data there is only one `AFDN` but in the desired output the data suggests multiple rows were found. Please explain your logic on how to get the second data from the first. – r2evans Aug 01 '23 at 16:52
  • I want unique values of SYMBOL. If they have both BLADDER2 AND BLADDER3. I want to choose BLADDER3. If only one then what ever it is. Also i want create another column that should say both if they have both types. – Aryh Aug 01 '23 at 16:53
  • "Highest TIER" is easy, the linked duplicate and comments cover it. But what about the 3rd column in your desired output which contains "both". What are the rules there? Are there really only 2 options to summarize? – divibisan Aug 01 '23 at 16:53

1 Answers1

2
library(dplyr)

df |>
  group_by(SYMBOL) |>
  mutate(Highest = ifelse(n_distinct(CANCER_TYPE) > 1, "both", CANCER_TYPE)) |>
  filter(Highest == "both" & CANCER_TYPE == "BLADDER3" | n() == 1) |>
  ungroup()
LMc
  • 12,577
  • 3
  • 31
  • 43