2

I've got a data frame with 34 columns and 12,964 rows, two of these columns are Gene.Name and Mutation_Frequency. For example:

Gene.Name Mutation_Frequency
CTLA4 0
TP53 4
CTLA4 2
CTLA4 2
TP53 4
TP53 6

I now want to create a column called "Highest_Mutation_Frequency" which tells me the highest mutation frequency for the Gene.Name, and puts it in a new column, like this:

Gene.Name Mutation_Frequency Highest_Mutation_Frequency
CTLA4 0 2
TP53 4 6
CTLA4 2 2
CTLA4 2 2
TP53 0 6
TP53 6 6

I realize I could probably use the max() command, but I'm not sure how to implement this. As always, any help is appreciated!

Edit: Although this is quite similar to another question: Select the row with the maximum value in each group this question also involves producing unique rows and placing them in another data frame.

Anoushiravan R
  • 21,622
  • 3
  • 18
  • 41
KLM117
  • 407
  • 3
  • 13
  • 2
    Try looking at [this post](https://stackoverflow.com/questions/25314336/extract-the-maximum-value-within-each-group-in-a-dataframe) – Ben Aug 07 '21 at 16:55

3 Answers3

5

You could use:

library(dplyr)

data %>%
  group_by(Gene.Name) %>%
  mutate(Highest_Mutation_Frequency = max(Mutation_Frequency))
Martin Gal
  • 16,640
  • 5
  • 21
  • 39
5

An option with base R

aggregate(cbind(Highest_Mutation_Frequency = Mutation_Frequency) ~ Gene.Name, data, FUN = max)
akrun
  • 874,273
  • 37
  • 540
  • 662
3

Another base R option using unique + ave

unique(
  transform(
    df,
    Highest_Mutation_Frequency = ave(Mutation_Frequency,Gene.Name,FUN = max)
  )[c("Gene.Name","Highest_Mutation_Frequency")]
)
ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81