replace value with group's frequent value

Question

I have very basic question about replacing the values in the groups most frequent value.

Here is what I mean

df <- data.frame(x=c(12,12,5,13,13,5),y=c(10,22,22,22,10,22),gr=gl(2,3))
> df
   x  y gr
1 12 10  1
2 12 22  1
3  5 22  1
4 13 22  2
5 13 10  2
6  5 22  2

As we can see for gr1 x column's frequent value is 12 and less frequent one is 5 and for y column it is 10. I would like to replace those values with lets say 666 and 777 for x and y ,respectively. In my real data the frequent x values are always the same but less frequent ones can change so general solutions would be good. dplyr solution is preferable since the real data is already in the pipeline.

library(dplyr)
df%>%
group_by(gr)%>%
....

the expected output

> df
   x  y  gr
1 12  777  1
2 12  22   1
3 666 22   1
4 13  22   2
5 13  777  2
6 666 22   2

We have to assume that you (at the very least) googled "most frequent value r". Please clarify why these answers weren't helpful. Cheers. — Henrik, Mar 06 '18 at 23:42
@Henrik Thanks. Yeap I searched and find this [post] (https://stackoverflow.com/questions/7735647/replacing-nas-with-latest-non-na-value?noredirect=1&lq=1) but it is for replacing NA values and no `dplyr` solution. I also googled `replace values with most frequent values within the group` and found this [post] (https://stackoverflow.com/questions/29255473/most-frequent-value-mode-by-group) and @dimitris_ps solution seems to ok. — Alexander, Mar 07 '18 at 00:03

score 2 · Answer 1 · answered Mar 07 '18 at 00:14

df2=df%>%
  group_by(gr)%>%
  mutate_all(
    funs(table(.)%>%
            which.max()%>%
             names()%>%
                as.numeric()
         )
    )

df%>%
  `is.na<-`(df2!=df)%>%
     replace_na(list(x=666,y=777))

    x   y gr
1  12 777  1
2  12  22  1
3 666  22  1
4  13  22  2
5  13 777  2
6 666  22  2

akrun · Accepted Answer · 2018-03-07T02:15:14.473

We can also do this with map2 after nesting. Create a function to calculate the most frequent value ('Mode')

library(tidyverse)
Mode <- function(x) {
  ux <- unique(x)
   ux[which.max(tabulate(match(x, ux)))]
}

After nesting the variables 'x', 'y', loop through those with map2 to replace the values that are not frequent with the corresponding replacement values i.e. 666, 777

df %>% 
   nest(-gr) %>% 
   mutate(data = map(data, ~ .x %>% 
                               map2_df(., c(666, 777), ~ 
                                replace(.x, .x != Mode(.x), .y))) ) %>% 
   unnest
#   gr   x   y
#1  1  12 777
#2  1  12  22
#3  1 666  22
#4  2  13  22
#5  2  13 777
#6  2 666  22

Or using data.table, it is a bit more easier. After grouping by 'gr' in the data.table object (setDT(df)), use Map to replace the corresponding columns non-frequent values with the corresponding values of interest (666, 777)

library(data.table)
setDT(df)[, Map(function(x, y) replace(x, x!= Mode(x), y), .SD, c(666, 777)), gr]
#   gr   x   y
#1:  1  12 777
#2:  1  12  22
#3:  1 666  22
#4:  2  13  22
#5:  2  13 777
#6:  2 666  22

Very nice `data.table` solution – IceCreamToucan Mar 07 '18 at 02:19 — IceCreamToucan, Mar 07 '18 at 02:19

replace value with group's frequent value

2 Answers2