-1

I'm using the sub function to clean up a dataset. Changing free response gender information to 3 different categories (Male, Female, Rainbow). The function is working just fine except for a few instances, which are the following:

"Rainbow?"
"Male (CIS)"
"Guy (-ish) ^_^" "Female (trans)" "Female (cis)"

What is confusing me is that the function worked 40 times for the other values I needed to change but I can't change these ones. Thanks!!!!

I've tried the function numerous times but it won't work for those values and I don't know why.

mh_in_tech <- data.frame(
  id = 1:4, 
  gender = c('femail', 'Femake', 'Rainbow?', 'Male (CIS)')
)

mh_in_tech$Gender_clean <- mh_in_tech$gender

# Here is some code that worked:
mh_in_tech$Gender_clean <- sub('femail', 'Female', mh_in_tech$Gender_clean)
mh_in_tech$Gender_clean <- sub('Femake', 'Female', mh_in_tech$Gender_clean)

# Code that did not work: 
mh_in_tech$Gender_clean <- sub('Rainbow?', 'Rainbow', mh_in_tech$Gender_clean)
mh_in_tech$Gender_clean <- sub('Male (CIS)', 'Male', mh_in_tech$Gender_clean)

mh_in_tech
#>   id     gender Gender_clean
#> 1  1     femail       Female
#> 2  2     Femake       Female
#> 3  3   Rainbow?     Rainbow?
#> 4  4 Male (CIS)   Male (CIS)
shs
  • 3,683
  • 1
  • 6
  • 34
  • 3
    You say it fails, but you don't say _what_ fails. Where's the code? What does the data look like, both the input and the expected output? – r2evans Feb 10 '23 at 16:52
  • 3
    Welcome to SO, Gaura Rader! Questions on SO (especially in R) do much better if they are reproducible and self-contained. By that I mean including attempted code (please be explicit about non-base packages), sample representative data (perhaps via `dput(head(x))` or building data programmatically (e.g., `data.frame(...)`), possibly stochastically), perhaps actual output (with verbatim errors/warnings) versus intended output. Refs: https://stackoverflow.com/q/5963269, [mcve], and https://stackoverflow.com/tags/r/info. – r2evans Feb 10 '23 at 16:53
  • 1
    We'd like to help, but we don't have enough information to know what you're doing. Can you edit your question to include your code and some examples that do work? – Ben Bolker Feb 10 '23 at 16:53
  • I would likely confuse myself by taking my errors in `$gender`, transferring them to `$gender_clean`, then taking them 'out', rather than simply `mh$gender_clean <- sub('not_clean', 'preferred', mh$gender)`... and further taking note of @nir-graham on fixed =. – Chris Feb 11 '23 at 17:33

1 Answers1

1

The cause of your issue is that sub and other related functions are based on regular expressions; but you seem to want to do fixed replacement's so it should be easy for you to adapt. compare the following

sub(x = "Male (CIS)",pattern = "Male (CIS)",replacement = "male")
sub(x = "Male (CIS)",pattern = "Male (CIS)",replacement = "male",fixed = TRUE)
Nir Graham
  • 2,567
  • 2
  • 6
  • 10