0

I have 3 data sets that i would like to merge. The first one is the coded data set:

ID   Gender Race
1    0      1
2    1      3
3    1      2

The second and third data sets are the code tables with descriptions

Code  Gender
0     Female
1     Male

and

Code  Race
1     White
2     Black
3     Asian

I want to see if there is a better way than just doing a ton of merge statements because i have many more variables than these 2 that i need to merge with their descriptions. I was thinking that a possible for loop or l/sapply would be good for this task.

I want to make it look like:

ID   Gender   Race
1    Female   White   
2    Male     Asian 
3    Male     Black

Thank you very much for your help!

Christopher Yee
  • 535
  • 2
  • 5
  • 14

1 Answers1

3

A little dplyr solution might be

main = read.csv(textConnection("ID,   Gender, Race
1,   0,      1
2,   1,      3
3,   1,      2"))

gen = read.csv(textConnection("Code,  Gender
0,     Female
1,     Male"), stringsAsFactors = FALSE)

race = read.csv(textConnection("Code,  Race
1,     White
2,    Black
3,     Asian"), stringsAsFactors = FALSE)

colnames(race) = c("Race", "RaceStr")
colnames(gen) = c("Gender", "GenderStr")

library(dplyr) # install.packages("dplyr")

main %>% 
  inner_join(gen) %>% 
  inner_join(race) %>% 
  select(ID, GenderStr, RaceStr)

The approach I'd take is to map the column names of the code description tables to the variable you're trying to make more readable, then just join by the mapping.

You might also need thr plyr package if you don't have it.

Akhil Nair
  • 3,144
  • 1
  • 17
  • 32
  • thanks for the help, the dplyr package cant seem to be installed though. – Christopher Yee May 21 '15 at 15:34
  • Installing package into ‘/home/cyee/R/x86_64-pc-linux-gnu-library/3.1’ (as ‘lib’ is unspecified) trying URL 'http://cran.rstudio.com/src/contrib/dplyr_0.4.1.tar.gz' Content type 'application/x-gzip' length 891010 bytes (870 Kb) opened URL ================================================== downloaded 870 Kb * installing *source* package ‘dplyr’ ... ** package ‘dplyr’ successfully unpacked and MD5 sums checked ** libs Error: package ‘Rcpp’ 0.11.2 was found, but >= 0.11.3 is required by ‘dplyr’ * removing ‘/home/cyee/R/x86_64-pc-linux-gnu-library/3.1/dplyr’ – Christopher Yee May 21 '15 at 15:54
  • 1
    Update the package `Rcpp` – Akhil Nair May 21 '15 at 16:06
  • 1
    Another common option for reading plain tables as in the OP is `read.table(header=TRUE,text="paste_OP_text")` That way, you don't have to add commas. – Frank May 21 '15 at 16:07
  • I get these 2 error: 1) `Warning message: In inner_join_impl(x, y, by$x, by$y) : joining factors with different levels, coercing to character vector` and `Error in tbl_vars(y) : argument "y" is missing, with no default` – Christopher Yee May 21 '15 at 19:15