I am a new R enthusiast working on expanding my knowledge. I am reading the An Introduction To Data Cleaning With R article by Edwin de Jonge and Mark van der Loo. I am working on exercise 2.4 and I would appreciate it if someone could confirm my technique in solving this specific problem: This is the original data:
1 // Survey data. Created : 21 May 2013
2 // Field 1: Gender
3 // Field 2: Age (in years)
4 // Field 3: Weight (in kg)
5 M;28;81.3
6 male;45;
7 Female;17;57,2
8 fem.;64;62.8
This is a cleaner version that I was able to construct:
df:
Gender Age..in.years. Weight..in.kg.
1 M 28 81.3
2 male 45 <NA>
3 Female 17 57,2
4 fem. 64 62.8
Now this is what I get from recoding using adist
D:
rawtext coded
1 M male
2 male male
3 Female female
4 fem. female
Now I have to transform the Gender column into a factor variable with labels man and woman. I have no idea how to proceed and I am thinking of changing the gender column of the data to the following column vector:
f <- factor(D$coded, levels = c("male", "female"), labels = c("man", "woman"))
which returns:
[1] man man woman woman
Levels: man woman
Am I correct or plain wrong?; Is there a way to use transform to directly change the Gender variable in df? i.e. is it better to do:
df$Gender <- plyr::revalue(D$coded, c(male = "man", female = "woman"))
Or is there another way to change the observations of the Gender variable to "man" or "woman" without using multiple ifesle commands?
I am trying to get answers by learning more about factors but nothing quite similar to this pops up anywhere. Thanks.