1

I have a data frame containing one variable "label" and want to add another variable "gender" based on information from another data frame that also contains the "label" variable. I usually use the match function and it normally works. However, this time it adds the variable, but with NAs as values. I guess this is a basic problem but I can't figure out a solution.

df1
   label
1  HDJ3
2  K4JS
3  SO25
4  L9HW

df2
   label  gender
1  SO25   m
2  HDJ3   f
3  L9HW   f
4  K4JS   m

df1$gender <- df2$gender[match(df1$label, df2$label)]

What I want is

df1
   label  gender
1  HDJ3   f
2  K4JS   m
3  SO25   m
4  L9HW   f

What I get is

df1
   label  gender
1  HDJ3   NA
2  K4JS   NA
3  SO25   NA
4  L9HW   NA

EDIT: The variables are all factors. I've already tried changing them into characters, but that doesn't work either. I've also tried the merge function, but in this case the data frame was completely empty, containing only the variable names. I'd be happy if somebody could help me with that. Thanks and apologies in advance if that has been asked befor.

**Edit2: The structure of the data frame shows differences in the variables:

> dput(df1) 
structure(list(label = structure(c(31L, 25L, 7L, 12L, 15L, 32L, 
33L, 24L, 14L, 17L, 1L, 28L, 20L, 6L, 11L, 19L, 9L, 16L, 22L, 
37L, 26L, 39L, 34L, 29L, 13L, 5L, 36L, 4L, 18L, 2L, 23L, 30L, 
3L, 8L, 35L, 27L, 10L, 38L, 21L), .Label = c("09YG", "0FWR", 
"0PZS", "4L78", "56C9", "5B1K", "5CL9", "5RJG", "696K", "8ZOQ", 
"92MB", "95KI", "99H5", "9VOZ", "A8KP", "A9ME", "APA5", "BVDN", 
"DI7S", "E4MS", "EPTR", "H34H", "HRTI", "JLSK", "K472", "KWWO", 
"MHAF", "PSK5", "Q6A4", "S2CK", "S7RU", "SK7H", "SRS8", "TCFS", 
"VQFM", "VWV4", "Z1GE", "ZGBU", "ZQZ7"), class = "factor")), row.names = c(NA, 
-39L), class = "data.frame")

> dput(df2)
structure(list(label = c("S7RU    ", "K472    ", "5CL9    ", 
"95KI    ", "A8KP    ", "-99     ", "SK7H    ", "SRS8    ", "JLSK    ", 
"95KI    ", "-99     ", "9VOZ    ", "APA5    ", "09YG    ", "PSK5    ", 
"E4MS    ", "5B1K    ", "92MB    ", "DI7S    ", "JLSK    ", "696K    "
), gender = c(3, 2, 3, 3, 3, 2, 3, 3, 3, 3, 3, 2, 3, 2, 3, 2, 
3, 2, 3, 3, 3)), row.names = c(NA, -21L), class = "data.frame")

The problem I see is the blank spaces in the second variable. Can anyone tell me where this comes from and how I can fix that?

jdschu
  • 11
  • 2
  • 1
    Make sure what you are working with when joining/merging.. Is `label` of class factor, or character? You say all variables are integers? That can't be correct, since they contain text. Better to provide sample data using `dput()`. – Wimpel Jun 19 '19 at 11:24
  • Ok apparently I used a wrong command to examine the data type [typeof() instead of class()] and the variables are all of type factor. Does a change in data type change the feasibility of the match/merge function? – jdschu Jun 19 '19 at 12:47

0 Answers0