1

I have a table with multiple variables. One of the variables is the Sex column of the participants of a study conducted by some other lab. the problem is, the Sex is defined sometimes as F, FEMALE or female, the same goes to the males. I need to make all the females go under F and all the males under M, using the plyr package (it's very old I know) -> mapvalues function. this is the data frame, it's pretty basic

mre_data <- structure(list(name = c("John", "Clara", "Smith", "Ray", "karen", "Ruba", "Josh", "Jennifer", "David", "Maria", "Sam"), 
               sex = c("Male",  "F", "MALE", "M", "FEMALE", "female", "MALE", "F", "male", "FEMALE", "M"), 
               age = c(30L, 32L, 54L, 42L, 11L, 34L, 67L, 49L, 27L, 18L, 30L)), 
          class = "data.frame", row.names = c("1", "2", "3", "4",  "5", "6", "7", "8", "9", "10", "11"))
dario
  • 6,415
  • 2
  • 12
  • 26
Kev
  • 375
  • 1
  • 7

1 Answers1

1
sex <- c("fema", "f", "F", "M", "male", "MALe")

We don't need plyr or any external packages:

toUpper(substr(sex,1,1))

Returns:

[1] "F" "F" "F" "M" "M" "M"

Edit

Use the solution above like this (in R, as opposed to other programming languages we very rarely mutate an object in place, so we almost always want to assign our values):

df$Sex <- toUpper(substr(df$Sex,1,1))

Or if you want to preserve the original column:

df$Sex_fixed <- toUpper(substr(df$Sex,1,1))

Another Edit:

As requested solutions using plyr::mapvalues:

mre_data$sex_fixed <- plyr::mapvalues(mre_data$sex, c("Male", "MALE", "M", "male", "F", "FEMALE", "female"), c("M", "M", "M", "M", "F", "F", "F"))

mre_data

Is now:

         name    sex age sex_fixed
  1      John   Male  30         M
  2     Clara      F  32         F
  3     Smith   MALE  54         M
  4       Ray      M  42         M
  5     karen FEMALE  11         F
  6      Ruba female  34         F
  7      Josh   MALE  67         M
  8  Jennifer      F  49         F
  9     David   male  27         M
  10    Maria FEMALE  18         F
  11      Sam      M  30         M

While this works it' doesn't make sense to me since we have to specify each replacement pair individually. But what we actually want is to apply a rule (first letter of word, uppercase) to each entry... (mapvalues is intended for usecases where we don't have a simple rule and have to specify each replacement pair).

OR, and this doesn't make sense at all and you really should use the direct route via x <- toUpper(substr(...)), but here we go - We can combine the "automatic" rules solution with the mapvalues solution to create a very confused and over complicated solution ;) (but at least we don't have to hardcode each replacement pair):

mre_data$sex_fixed2 <- plyr::mapvalues(mre_data$sex, unique(mre_data$sex), toUpper(substr(unique(mre_data$sex),1,1)))

mre_data

Is now:

         name    sex age sex_fixed sex_fixed2
  1      John   Male  30         M          M
  2     Clara      F  32         F          F
  3     Smith   MALE  54         M          M
  4       Ray      M  42         M          M
  5     karen FEMALE  11         F          F
  6      Ruba female  34         F          F
  7      Josh   MALE  67         M          M
  8  Jennifer      F  49         F          F
  9     David   male  27         M          M
  10    Maria FEMALE  18         F          F
  11      Sam      M  30         M          M
dario
  • 6,415
  • 2
  • 12
  • 26
  • That didn't work, it just created an array with c("female", "f", "F", "M", "male", "MALE") and replaced them with F and M, the data frame is still the same nothing changed. – Kev Oct 14 '21 at 08:35
  • Did you assing the return values to the data.frame column? It would be easier for you (and us) if you provided a [minimal reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610#5963610). Adding a MRE and an example of the desired output makes it much easier for others to find and test an answer to your question. That way you can help others to help you! – dario Oct 14 '21 at 08:36
  • No I just typed your code, should I add something ? wait I'll try I'm just new here – Kev Oct 14 '21 at 08:39
  • ;) I think you should... see my edit – dario Oct 14 '21 at 08:41
  • YES! it worked finally thanks you so much. By the way , how could we have done this with the 'mapvalues' function ? I have been given this task to perform by mapvalues function – Kev Oct 14 '21 at 08:48
  • Regarding `mapvalues`: I don't think there is a good way using `plyr::mapvalues` since you would have to provide each pair of "value_to_replace" and "replacement_value" individually. That could be quite annoying and error-prone If you add a MRE in code form (i.e. as described in the link above) I can write you a solution that works for the example. (the problem with the example data you added is that there is no way I can easily import this data into my R session. Thats why you should use either the output from `dput` or construct the example data in a `data.frame` call – dario Oct 14 '21 at 08:54
  • I think i did.. if you have solution using mapvalues that would be awesome! – Kev Oct 14 '21 at 09:35
  • hm, still: How can I import this data into R? since you are new to R I updated the example data for you. But it's really you who should do thiis ;) – dario Oct 14 '21 at 09:46
  • You are very welcome. Just remember next time when asking a question to provide a MRE that is as easy to use as possible :) – dario Oct 14 '21 at 10:19