0

I am using the gender package in R and unfortunately the gender function in that package returns blank tibbles when it cannot classify a name as male or female.

Is there a "purrr"- style way to apply the gender function so that empty tibbles of size n x m are replaced by NAs of size n x m in my output, so as to keep the row-size of the inputs and outputs to the gender function equal?

I would like to find a solution that does not involve writing a wrapper for the gender function (if possible).

mlachans
  • 49
  • 8
  • Maybe have a look [here](https://stackoverflow.com/questions/24172111/change-the-blank-cells-to-na) – prosoitos Oct 17 '19 at 00:52
  • Thank you. I don't think this is quite what I'm looking for. The "blank" cells in that post would correspond to a strictly positive dimension tibble, but I'm dealing with a tibble that is say 0-rows x m-columns that I'd like to convert to 1-row x m-columns. – mlachans Oct 17 '19 at 01:10
  • Can you make a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example)? – camille Oct 17 '19 at 02:14
  • library(gender) mydata <- data.frame(name = c("Neil", "Askey"), stringsAsFactors = FALSE) gender(mydata$name) # this generates a 1 x 6 tibble, but I'd like it to be a 2 x 6 tibble with a row of NAs. – mlachans Oct 17 '19 at 13:35

1 Answers1

1

I'd approach this by storing the names in a data frame column, then joining the results from gender() back to the original data.

For example:

library(gender)

mydata <- data.frame(name = c("Neil", "Askey"), stringsAsFactors = FALSE)

merge(mydata, gender(mydata$name), all = TRUE)

Result:

   name proportion_male proportion_female gender year_min year_max
1 Askey              NA                NA   <NA>       NA       NA
2  Neil          0.9964            0.0036   male     1932     2012
neilfws
  • 32,751
  • 5
  • 50
  • 63
  • This worked for my task with minor modification. I was hoping for "insight" into the way zero-height tibbles actually work. If nothing shows up in a few days, I'll mark your answer as correct. – mlachans Oct 17 '19 at 01:37
  • Zero-height just means that each column of the tibble (or data frame) is a vector with length zero. That's different to having a value of NA. Operations like `rbind()` won't bind a row when there is no row and there are no base functions to convert zero-length to NA. You could write a function that checks for _e.g._ `if(nrow(mydf) == 0) { ... }` and then return a 1-row tibble with NA values, but I think joining back to the original data is simpler. – neilfws Oct 17 '19 at 03:32
  • I agree with your idea for an alternate solution. I considered writing a wrapper for gender that checks if the output has the characteristics you lay out, but I was hoping that there would be a "purrr" functional programming solution. – mlachans Oct 17 '19 at 13:35