How can I fill NA-values in a data frame column based on the values from an other column?

Question

I'd like to fill the NA-values in F2-column, based on the the most common F2-value when grouped by F1-column.

  F1 F2
1 A  C
2 B  D
3 A  NA
4 A  C
5 B  NA

Desired outcome:

  F1 F2
1 A  C
2 B  D
3 A  C
4 A  C
5 B  D

Thank you for help

@Leero11 what value should be inserted when there are ties? – markus Jul 07 '20 at 13:04 — markus, Jul 07 '20 at 13:04
@markus in a tie situation either one is ok – Leero11 Jul 11 '20 at 07:26 — Leero11, Jul 11 '20 at 07:26

score 3 · Accepted Answer · answered Jul 07 '20 at 13:27

Here is a base R solution. First define a function for Mode (Taken from here) and then apply it to you data frame, i.e.

Mode <- function(x) {
  ux <- unique(x)
  ux[which.max(tabulate(match(x, ux)))]
}

df$F2 <- with(df, ave(F2, F1, FUN = function(i) replace(i, is.na(i), Mode(i))))

df
#  F1 F2
#1  A  C
#2  B  D
#3  A  C
#4  A  C
#5  B  D

score 1 · Answer 2 · answered Jul 07 '20 at 13:16

Here is one way using dplyr :

library(dplyr)

df %>%
  group_by(F1) %>%
  mutate(F2 = replace(F2, is.na(F2), 
                      names(sort(table(F2), decreasing = TRUE)[1])))

#  F1    F2   
#  <chr> <chr>
#1 A     C    
#2 B     D    
#3 A     C    
#4 A     C    
#5 B     D

In case of ties, preference is given to lexicographic order.

score 0 · Answer 3 · answered Jul 07 '20 at 13:16

Try this:

First in df2 I get max count by the variable F1 where F2 is not missing. That will give you the most common F2 value when groups by F1. I join it back onto the original data.frame and use a mutate to fill by the new variable F2_fill and then remove it from this variable from the data.frame.

library(tidyverse)

df <- tribble(
  ~F1, ~F2,
 'A',  'C',
 'B' , 'D',
 'A'  ,NA,
 'A',  'C',
 'B',  NA)

df2 <- df %>% 
        group_by(F1) %>% 
        count(F2) %>% 
        filter(!is.na(F2), n == max(n)) %>% 
        select(-n) %>% 
        rename(F2_fill = F2)

df3 <- left_join(df,df2, by="F1") %>% 
      mutate(F2 = ifelse(is.na(F2), F2_fill,F2)) %>% 
        select(-F2_fill)

score 0 · Answer 4 · answered Jul 07 '20 at 14:37

You can use ave with table and which.max and subsetting with is.na when it is a character.

i <- is.na(x$F2)
x$F2[i] <- ave(x$F2, x$F1, FUN=function(y) names(which.max(table(y))))[i]
x
#  F1 F2
#1  A  C
#2  B  D
#3  A  C
#4  A  C
#5  B  D

Data:

x <- data.frame(F1 = c("A", "B", "A", "A", "B")
  , F2 = c("C", "D", NA, "C", NA))

How can I fill NA-values in a data frame column based on the values from an other column?

4 Answers4