1

I'm trying to "guess" the gender with the first name of a person. I understand that there is a gender package, but I want to utilize it using my own data.

As a beginner, I attempted to copy the gender package's code, but it returned empty results.

This is my database named namestat.

dput(head(namestat,10))
structure(list(name = c("AABIA", "AABIDA", "AABISH", "AADARSH", 
"AADIA", "AAEISHA", "AAESHA", "AAFAF", "AAFIA", "AAFIRA"), female = c(1, 
2, 1, 2, 1, 1, 1, 1, 19, 1), male = c(0, 0, 0, 0, 0, 0, 0, 0, 
0, 0)), row.names = c(NA, 10L), class = "data.frame")

This is the code:

function(names) {

    namestat %>%
        filter(name %in% tolower(names)) %>%
        group_by(name) %>%
        summarise(female = sum(female),
                  male = sum(male)) %>%
        mutate(proportion_male = round((male / (male + female)),
                                       digits = 4),
               proportion_female = round((female / (male + female)),
                                         digits = 4)) %>%
        mutate(gender = ifelse(proportion_female == 0.5, "either",
                               ifelse(proportion_female > 0.5, "female",
                                      "male"))) %>%
        select(name, proportion_male, proportion_female, gender)

}

I expect the output with genderfunc("AABIA")

 name  proportion_male proportion_female gender

  <chr>           <dbl>             <dbl> <chr>  

1 AABIA            0             1     female 

but currently I receive an empty result.

NelsonGon
  • 13,015
  • 7
  • 27
  • 57
Nick Zhao
  • 21
  • 3
  • Provide a sample of namestat with `dput` – NelsonGon Apr 22 '19 at 02:57
  • Use **dput** for data not links to **drive**. Please read [this](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). – NelsonGon Apr 22 '19 at 03:04
  • 1
    Thank you, I have completed those changes you mentioned. Sorry for the drive link. – Nick Zhao Apr 22 '19 at 03:15
  • Does `Nick` exist in namestat? ! – NelsonGon Apr 22 '19 at 03:38
  • Sorry. It does not. I have altered the text. – Nick Zhao Apr 22 '19 at 03:40
  • I mean in your original namestat. It seems namestat is a "dictionary" that holds stats on number of people with a given name. – NelsonGon Apr 22 '19 at 03:41
  • 1
    This `name %in% tolower(names)` will fail as _name_ will be in Capital letters and `tolower(names)` will produce `aabia` i.e. small letters. So you have two options either `namestat %>% filter(tolower(name) %in% tolower(names)) %>% ...` or `namestat %>%filter(name %in% names) %>% ...` – A. Suliman Apr 22 '19 at 03:50

1 Answers1

1

Too long to add as a comment. Using this works for me:

select_me<-function(nam){
  df %>%
  group_by(name) %>%
  summarise(female = sum(female),
            male = sum(male)) %>%
  mutate(proportion_male = round((male / (male + female)),
                                 digits = 4),
         proportion_female = round((female / (male + female)),
                                   digits = 4)) %>%
  mutate(gender = ifelse(proportion_female == 0.5, "either",
                         ifelse(proportion_female > 0.5, "female",
                                "male"))) %>% 
 dplyr::select(name, proportion_male, proportion_female, gender) %>% 
    filter(name%in%nam)
}
select_me("AABIA")


# A tibble: 1 x 4
  name  proportion_male proportion_female gender
  <chr>           <dbl>             <dbl> <chr> 
1 AABIA               0                 1 female
NelsonGon
  • 13,015
  • 7
  • 27
  • 57