Replace values by the most present value

Question

ID <- c("A", "A", "A", "B", "B", "c")
Value <- c("blue", "blue", "green", "red", "orange", NA)
df <- tibble(ID, Value)

I have to group by the ID, keep the value of Value which is the most repeated If the values are equal (ID == "B"), I choose the first value

The value variable should look like :

Value_output <- c("blue", "blue", "blue", "red", "red", NA)

Take a look at [this post](https://stackoverflow.com/a/8189441/8583393) and then do `... %>% mutate(Value = Mode(Value))` — markus, Jul 17 '18 at 14:46

score 2 · Answer 1 · answered Jul 17 '18 at 14:47

2

Solution with data.table package (count Value by ID).

ID <- c("A", "A", "A", "B", "B", "c")
Value <- c("blue", "blue", "green", "red", "orange", NA)

library(data.table)
foo <- data.table(ID, Value)
setkey(foo, ID)
foo[foo[, .N, .(ID, Value)][order(N, decreasing = TRUE)][, .(Value = Value[1]), ID]]$i.Value
[1] "blue" "blue" "blue" "red"  "red"  NA

answered Jul 17 '18 at 14:47

pogibas

27,303
19
84
117

Thank you very much exactly what i was looking for ! – Mostafa90 Jul 17 '18 at 14:55
1

If you had akrun's helper function `data.table` solution could be reduced to: `df[, Value := Mode(Value), ID]` – s_baldur Jul 17 '18 at 15:13

akrun · Accepted Answer · 2018-07-17T15:59:34.303

2

We can get the Mode by group

library(dplyr)
df %>%
   group_by(ID) %>%
   arrange(ID, is.na(Value)) %>% # in case to keep non- NA elements for a tie
   mutate(Value_output = Mode(Value))

where

 Mode <- function(x) {
   ux <- unique(x)
  ux[which.max(tabulate(match(x, ux)))]
 }

data

ID <- c("A", "A", "A", "B", "B", "c", "c")
Value <- c("blue", "blue", "green", "red", "orange", NA, "yellow")
df <- tibble(ID, Value)

edited Jul 17 '18 at 15:59

answered Jul 17 '18 at 14:58

akrun

874,273
37
540
662

if I add the row `c , yellow` to my dataframe, and I want too keep yellow and not the NA value for the c ID how to solve that ? – Mostafa90 Jul 17 '18 at 15:52
@DimitriPetrenko Updated the post – akrun Jul 17 '18 at 15:59

score 2 · Answer 3 · answered Jul 17 '18 at 15:03

Using base R:

lot <- aggregate(
  Value ~ ID, 
  df, 
  function(x) names(sort(table(x), decreasing=TRUE))[1]
)
df$Value <- lot[match(df$ID, lot$ID), "Value"]
df
  ID    Value 
  <chr> <chr> 
1 A     blue  
2 A     blue  
3 A     blue  
4 B     orange
5 B     orange
6 c     NA

Replace values by the most present value

3 Answers3

data