sex <- c("fema", "f", "F", "M", "male", "MALe")
We don't need plyr
or any external packages:
toUpper(substr(sex,1,1))
Returns:
[1] "F" "F" "F" "M" "M" "M"
Edit
Use the solution above like this (in R, as opposed to other programming languages we very rarely mutate an object in place, so we almost always want to assign our values):
df$Sex <- toUpper(substr(df$Sex,1,1))
Or if you want to preserve the original column:
df$Sex_fixed <- toUpper(substr(df$Sex,1,1))
Another Edit:
As requested solutions using plyr::mapvalues
:
mre_data$sex_fixed <- plyr::mapvalues(mre_data$sex, c("Male", "MALE", "M", "male", "F", "FEMALE", "female"), c("M", "M", "M", "M", "F", "F", "F"))
mre_data
Is now:
name sex age sex_fixed
1 John Male 30 M
2 Clara F 32 F
3 Smith MALE 54 M
4 Ray M 42 M
5 karen FEMALE 11 F
6 Ruba female 34 F
7 Josh MALE 67 M
8 Jennifer F 49 F
9 David male 27 M
10 Maria FEMALE 18 F
11 Sam M 30 M
While this works it' doesn't make sense to me since we have to specify each replacement pair individually. But what we actually want is to apply a rule (first letter of word, uppercase) to each entry... (mapvalues
is intended for usecases where we don't have a simple rule and have to specify each replacement pair).
OR, and this doesn't make sense at all and you really should use the direct route via x <- toUpper(substr(...))
, but here we go - We can combine the "automatic" rules solution with the mapvalues
solution to create a very confused and over complicated solution ;) (but at least we don't have to hardcode each replacement pair):
mre_data$sex_fixed2 <- plyr::mapvalues(mre_data$sex, unique(mre_data$sex), toUpper(substr(unique(mre_data$sex),1,1)))
mre_data
Is now:
name sex age sex_fixed sex_fixed2
1 John Male 30 M M
2 Clara F 32 F F
3 Smith MALE 54 M M
4 Ray M 42 M M
5 karen FEMALE 11 F F
6 Ruba female 34 F F
7 Josh MALE 67 M M
8 Jennifer F 49 F F
9 David male 27 M M
10 Maria FEMALE 18 F F
11 Sam M 30 M M