2

I have very basic question about replacing the values in the groups most frequent value.

Here is what I mean

df <- data.frame(x=c(12,12,5,13,13,5),y=c(10,22,22,22,10,22),gr=gl(2,3))
> df
   x  y gr
1 12 10  1
2 12 22  1
3  5 22  1
4 13 22  2
5 13 10  2
6  5 22  2

As we can see for gr1 x column's frequent value is 12 and less frequent one is 5 and for y column it is 10. I would like to replace those values with lets say 666 and 777 for x and y ,respectively. In my real data the frequent x values are always the same but less frequent ones can change so general solutions would be good. dplyr solution is preferable since the real data is already in the pipeline.

library(dplyr)
df%>%
group_by(gr)%>%
....

the expected output

> df
   x  y  gr
1 12  777  1
2 12  22   1
3 666 22   1
4 13  22   2
5 13  777  2
6 666 22   2
Alexander
  • 4,527
  • 5
  • 51
  • 98
  • 2
    We have to assume that you (at the very least) googled "most frequent value r". Please clarify why these answers weren't helpful. Cheers. – Henrik Mar 06 '18 at 23:42
  • 2
    related: what have you tried – De Novo Mar 06 '18 at 23:45
  • @Henrik Thanks. Yeap I searched and find this [post] (https://stackoverflow.com/questions/7735647/replacing-nas-with-latest-non-na-value?noredirect=1&lq=1) but it is for replacing NA values and no `dplyr` solution. I also googled `replace values with most frequent values within the group` and found this [post] (https://stackoverflow.com/questions/29255473/most-frequent-value-mode-by-group) and @dimitris_ps solution seems to ok. – Alexander Mar 07 '18 at 00:03
  • @DanHall Sorry I could not came with any idea:( – Alexander Mar 07 '18 at 00:04

2 Answers2

2
df2=df%>%
  group_by(gr)%>%
  mutate_all(
    funs(table(.)%>%
            which.max()%>%
             names()%>%
                as.numeric()
         )
    )

df%>%
  `is.na<-`(df2!=df)%>%
     replace_na(list(x=666,y=777))

    x   y gr
1  12 777  1
2  12  22  1
3 666  22  1
4  13  22  2
5  13 777  2
6 666  22  2
Onyambu
  • 67,392
  • 3
  • 24
  • 53
2

We can also do this with map2 after nesting. Create a function to calculate the most frequent value ('Mode')

library(tidyverse)
Mode <- function(x) {
  ux <- unique(x)
   ux[which.max(tabulate(match(x, ux)))]
}

After nesting the variables 'x', 'y', loop through those with map2 to replace the values that are not frequent with the corresponding replacement values i.e. 666, 777

df %>% 
   nest(-gr) %>% 
   mutate(data = map(data, ~ .x %>% 
                               map2_df(., c(666, 777), ~ 
                                replace(.x, .x != Mode(.x), .y))) ) %>% 
   unnest
#   gr   x   y
#1  1  12 777
#2  1  12  22
#3  1 666  22
#4  2  13  22
#5  2  13 777
#6  2 666  22

Or using data.table, it is a bit more easier. After grouping by 'gr' in the data.table object (setDT(df)), use Map to replace the corresponding columns non-frequent values with the corresponding values of interest (666, 777)

library(data.table)
setDT(df)[, Map(function(x, y) replace(x, x!= Mode(x), y), .SD, c(666, 777)), gr]
#   gr   x   y
#1:  1  12 777
#2:  1  12  22
#3:  1 666  22
#4:  2  13  22
#5:  2  13 777
#6:  2 666  22
akrun
  • 874,273
  • 37
  • 540
  • 662