fill missing rows in a dataframe

Question

I have a data frame shown below

   Hair   Eye    Freq
1  Black Brown      32
2  Brown Brown      53
3    Red Brown      10
4  Blond Brown       3
5    Red  Blue      10
6  Blond  Blue      30
7  Black Hazel      10
8  Blond Hazel       5

In the above data frame frequency of 4 hair colors Black, Brown, Red and Blond are noted across different eye colors Brown, Blue and Hazel. However, I would like to fill in the missing hair color frequency for respective eye color so that it results in a data frame as below. Any help is appreciated.

   Hair   Eye    Freq
1  Black Brown      32
2  Brown Brown      53
3    Red Brown      10
4  Blond Brown       3
5  Black  Blue      0
6  Brown  Blue      0
7    Red  Blue      10
8  Blond  Blue      30
9  Black Hazel      10
10 Brown Hazel      0
11   Red Hazel      0
12 Blond Hazel      5

@akrun added a python target as well now – Jaap Mar 22 '17 at 07:25 — Jaap, Mar 22 '17 at 07:25

Sathish · Answer 1 · 2017-03-22T06:56:40.390

Create a new table with combinations of hair and eye colors using expand.grid. Then use join approach to bind the frequencies of df1 to df2. Finally remove NAs.

library('data.table')
hair <- c('Black', 'Brown', 'Red', 'Blond')  # hair colors
eye <- c('Brown', 'Blue', 'Hazel')           # eye colors
df2 <- expand.grid(Hair = hair, Eye = eye)   # data frame with combinations of eye and hair colors
setDT(df2)[df1, `:=` (Freq = i.Freq), on = .(Hair, Eye)]  # join df2 with df1 based `on = .(Hair, Eye)` and bind `Freq` from df1 to df2
df2[is.na(Freq), Freq := 0 ]                # remove NA with 0

Output:

df2
#     Hair   Eye Freq
# 1: Black Brown   32
# 2: Brown Brown   53
# 3:   Red Brown   10
# 4: Blond Brown    3
# 5: Black  Blue    0
# 6: Brown  Blue    0
# 7:   Red  Blue   10
# 8: Blond  Blue   30
# 9: Black Hazel   10
# 10: Brown Hazel    0
# 11:   Red Hazel    0
# 12: Blond Hazel    5

Data:

df1 <- fread('id   Hair   Eye    Freq
1  Black Brown      32
2  Brown Brown      53
3    Red Brown      10
4  Blond Brown       3
5    Red  Blue      10
6  Blond  Blue      30
7  Black Hazel      10
8  Blond Hazel       5')

df1[, id:=NULL]

score 2 · Answer 2 · answered Mar 22 '17 at 07:05

One base R option is to create another data frame with expand.grid for every combination of Hair and Eye and merge it with the original one.

merge(expand.grid(Hair=unique(df$Hair),Eye=unique(df$Eye)), df[-1], all.x = TRUE)

#    Hair   Eye Freq
#1  Black  Blue   NA
#2  Black Brown   32
#3  Black Hazel   10
#4  Blond  Blue   30
#5  Blond Brown    3
#6  Blond Hazel    5
#7  Brown  Blue   NA
#8  Brown Brown   53
#9  Brown Hazel   NA
#10   Red  Blue   10
#11   Red Brown   10
#12   Red Hazel   NA

The result above gives NA, we can easily convert those NA's to 0

df1 <- merge(expand.grid(Hair = unique(df$Hair), Eye = unique(df$Eye)), df[-1], 
                                                              all.x = TRUE)
df1[is.na(df1)] <- 0

akrun · Accepted Answer · 2017-03-22T06:43:46.340

1

If we are using R, one option is complete from tidyr

library(tidyr)
complete(df1, Hair, Eye, fill = list(Freq = 0)) %>%
      arrange(factor(Eye, levels = unique(df1$Eye)), factor(Hair, levels = unique(df1$Hair)))
# A tibble: 12 × 3
#    Hair   Eye  Freq
#   <chr> <chr> <dbl>
#1  Black Brown    32
#2  Brown Brown    53
#3    Red Brown    10
#4  Blond Brown     3
#5  Black  Blue     0
#6  Brown  Blue     0
#7    Red  Blue    10
#8  Blond  Blue    30
#9  Black Hazel    10
#10 Brown Hazel     0
#11   Red Hazel     0
#12 Blond Hazel     5

edited Mar 22 '17 at 06:43

answered Mar 22 '17 at 06:39

akrun

874,273
37
540
662

1

you are a gem. I spent all day...you made it look very simple. Never heard complete function. Thank you – user5249203 Mar 22 '17 at 06:43
why is it required to use `list(Freq = 0)` and not just `Freq = 0`. ? – user5249203 Mar 22 '17 at 06:54
1

@user5249203 I think it is to consider for multiple variables – akrun Mar 22 '17 at 06:55

fill missing rows in a dataframe

3 Answers3

Linked