0

I have two datasets containing a lot of data. I originally wanted to compile a list of rsIDs that were common between two data sets, so I wrote the following:

file1 <- read.csv("CKD.csv", header = TRUE, sep = ",")
file2 <- read.csv("eGFR.csv", header = TRUE, sep = ",")

write.table(file1$rsID.[match(file1$rsID., file2$rsID., nomatch = NA, incomparables = TRUE)],
  "rs_matches_Raw.csv", sep = ",", row.names = FALSE, col.names = c("rsID.")
  )

x <- read.csv("rs_matches_Raw.csv",header = TRUE, sep = ",")

write.table(na.omit(x), "rs_matches_final", sep = ",", row.names = FALSE, col.names = c("rsID."))

It did what I wanted it to do. Now I want some additional information; for example the chromosomal location. Is there a way that I can use my above result and apply it to the data-set to get the rest of the information?

For example: suppose that rs1, rs2, and rs3 are in both files.

x<- c("rs1", "rs2", "rs3")

f(x) = variantIDrs1, variantIDrs2, variantIDrs3

And ideally get more information than just this, but this is just an example.

I have tried using match.data.frame from EcFun and an inner join from plyr. Thanks.

Ctat41
  • 69
  • 6

1 Answers1

0

I was able to get this work using dplyr package select and filter functions.

Ctat41
  • 69
  • 6