1

I am working with the IMDB dataset and am trying to find the best solution for filling in my empty values, an example is below, Example:

showTitle        genres
Money Heist      Action,Crime,Mystery
The Office       Comedy
Money Heist      NA
Breaking Bad     Crime,Drama,Thriller
Money Heist      Action,Crime,Mystery
Money Heist      NA
The Office       NA

Desired Result

showTitle        genres
    Money Heist      Action,Crime,Mystery
    The Office       Comedy
    Money Heist      Action,Crime,Mystery
    Breaking Bad     Crime,Drama,Thriller
    Money Heist      Action,Crime,Mystery
    Money Heist      Action,Crime,Mystery
    The Office       NA

attempted

df %>% if(df$showTitle == "Money Heist"){df$genres} <- Action,Crime,Mystery

If the solution is possible without an If statement thats fine as long as it isnt manually correcting the cells.

3 Answers3

2

Based on the details provided, perhaps this type of approach would suit:

library(tidyverse)
# Create some 'fake' data
df <- data.frame(primaryTitle = c("Money Heist", "Money Heist", "Die Hard", "Die Hard", "Die Hard"),
                 genre = c("Action,Crime,Mystery", NA, "Action", "Action", NA))
df
#>   primaryTitle                genre
#> 1  Money Heist Action,Crime,Mystery
#> 2  Money Heist                 <NA>
#> 3     Die Hard               Action
#> 4     Die Hard               Action
#> 5     Die Hard                 <NA>

# Take the fake data
df %>%
  # sort the data by title
  arrange(primaryTitle) %>%
  # if the genre is "NA" and the title == the previous title,
  # fill in genre with the previous genre
  mutate(genre = if_else(is.na(genre) & primaryTitle == lag(primaryTitle),
                        lag(genre),
                        genre))
#>   primaryTitle                genre
#> 1     Die Hard               Action
#> 2     Die Hard               Action
#> 3     Die Hard               Action
#> 4  Money Heist Action,Crime,Mystery
#> 5  Money Heist Action,Crime,Mystery

Created on 2021-08-16 by the reprex package (v2.0.0)

With your example:

library(tidyverse)
df <- tibble::tribble(
  ~showTitle, ~genre,
  "Money Heist",      "Action,Crime,Mystery",
  "Money Heist",      NA,
  "Breaking Bad",     "Crime,Drama,Thriller",
  "Money Heist",      "Action,Crime,Mystery",
  "Money Heist",      NA,
  "The Office",       NA
)


df
#> # A tibble: 6 x 2
#>   showTitle    genre               
#>   <chr>        <chr>               
#> 1 Money Heist  Action,Crime,Mystery
#> 2 Money Heist  <NA>                
#> 3 Breaking Bad Crime,Drama,Thriller
#> 4 Money Heist  Action,Crime,Mystery
#> 5 Money Heist  <NA>                
#> 6 The Office   <NA>

df %>%
  arrange(showTitle) %>%
  mutate(genre = if_else(is.na(genre) & showTitle == lag(showTitle),
                        lag(genre),
                        genre))
#> # A tibble: 6 x 2
#>   showTitle    genre               
#>   <chr>        <chr>               
#> 1 Breaking Bad Crime,Drama,Thriller
#> 2 Money Heist  Action,Crime,Mystery
#> 3 Money Heist  Action,Crime,Mystery
#> 4 Money Heist  Action,Crime,Mystery
#> 5 Money Heist  Action,Crime,Mystery
#> 6 The Office   <NA>

Created on 2021-08-16 by the reprex package (v2.0.0)

jared_mamrot
  • 22,354
  • 4
  • 21
  • 46
2

This answer was given by @DPH, I am not sure why he deleted.

You can use tidyr::fill to replace NA values for each showTitle.

library(dplyr)
library(tidyr)

df %>%
  group_by(showTitle) %>%
  fill(genre, .direction = 'updown') %>%
  ungroup

#  showTitle    genre               
#  <chr>        <chr>               
#1 Money Heist  Action,Crime,Mystery
#2 Money Heist  Action,Crime,Mystery
#3 Breaking Bad Crime,Drama,Thriller
#4 Money Heist  Action,Crime,Mystery
#5 Money Heist  Action,Crime,Mystery
#6 The Office   Comedy              
#7 The Office   Comedy              
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
0

One liner with base and indexing the subset through which(),

df[which(df$showTitle == 'Money Heist' & is.na(df$genre)), 'genre'] <- "Action,Crime,Mystery"
     showTitle                  genre
1  Money Heist   Action,Crime,Mystery
2  Money Heist   Action,Crime,Mystery
3  Breaking Bad  Crime,Drama,Thriller
4  Money Heist   Action,Crime,Mystery
5  Money Heist   Action,Crime,Mystery
6  The Office    <NA>
Nicolás Velasquez
  • 5,623
  • 11
  • 22