1

Having a dataframe with have the gender of specific names

dfgender <- data.frame(name = c("Helen","Erik"), gender = c("F","M"))

How is it possible to use the previous data frame in order to check the names of another column of a dataframe and insert "Neutral" if the name is not in the list of gender dataframe: Example of the dataframe with the names:

dfnames <- data.frame(names = c("Helen", "Von", "Erik", "Brook"))

Example of expected output

dfnames <- data.frame(name = c("Helen", "Von", "Erik", "Brook"), gender = c("F", "Neutral", "M", "Neutral"))
Erik Brole
  • 315
  • 9
  • 3
    Since you have a lookup table already, you can do a left-join and then fill in `NA` values. A couple dozen options for doing that [here](https://stackoverflow.com/q/8161836/5325862) – camille Nov 21 '22 at 15:51

3 Answers3

3

left_join + replace_na should do:

dfnames %>% left_join(dfgender, by=c('names' = 'name')) %>% 
  mutate(gender = gender %>% as.character %>% replace_na('Neutral'))

# names  gender
# 1 Helen       F
# 2   Von Neutral
# 3  Erik       M
# 4 Brook Neutral
Juan C
  • 5,846
  • 2
  • 17
  • 51
1

The (experimental) rows_update could be an intuitive compliment to @Juan C's answer:

library(dplyr)

dfnames |>
  mutate(gender = "Neutral") |>
  rows_update(rename(dfgender, names = name), "names")

Output:

  names  gender
1 Helen       F
2   Von Neutral
3  Erik       M
4 Brook Neutral
harre
  • 7,081
  • 2
  • 16
  • 28
0

Here's a solution similar to Juan C's but with, I think, a simpler replacement of NAs:

library(dplyr)
library(tidyr)

dfgender <- data.frame(name = c("Helen","Erik"), gender = c("F","M"))
dfnames <- data.frame(names = c("Helen", "Von", "Erik", "Brook"))

dfnames %>%
  left_join(dfgender, by = c("names" = "name")) %>%
  replace_na(list(gender = "Neutral"))
#   names  gender
# 1 Helen       F
# 2   Von Neutral
# 3  Erik       M
# 4 Brook Neutral

And here's another solution with no tidyr dependency:

library(dplyr)

dfgender <- data.frame(name = c("Helen","Erik"), gender = c("F","M"))
dfnames <- data.frame(names = c("Helen", "Von", "Erik", "Brook"))

dfnames %>%
  left_join(dfgender, by = c("names" = "name")) %>%
  mutate(gender = coalesce(gender, "Neutral"))
#   names  gender
# 1 Helen       F
# 2   Von Neutral
# 3  Erik       M
# 4 Brook Neutral
Santiago
  • 641
  • 3
  • 14