0

I am new to R and am having issues figuring out how to create a new dataset from 2 already existing datasets that I have.

I have two datasets that look like this, with different/identical MRNs

  1. MRN Age Status
  2. 222 50 likely
  3. 345 64 likely
  4. 555 75 unknown
  5. 888 56 likely
  6. 675 52 unknown

Second Dataset

  1. MRN Age Status
  2. 222 50 likely
  3. 446 35 unknown
  4. 555 75 unknown
  5. 888 56 likely
  6. 678 48 unknown

I wanted to find the identical MRN's that matched up between both datasets. So I used:

MatchedMRN <- intersect(Tissue$MRN, ctDNA$MRN)

but now I would like to create a new dataset with the matched MRN values, but also with all the other variable/columns in the original 2 datasets that go along with those MRN values. Is there anyway to do this in R?

Thank you!

1 Answers1

0

You can solve this type of issue using joins, from the dplyr package. Here is an example:

library(dplyr)

df1 <- data.frame(index = c("1", "2", "3"), age= c(20,44,32))

df2 <- data.frame(index = c("1", "2", "4"), sex=c("M", "M", "F"))

df1and2 <-full_join(df1, df2, by="index")

With this full join, you match the values in df1 with values in df2 based on index, which we use as a "primary key". We get a data frame with the columns age, index and sex. Where a key is only present in one data frame, you will have NA values for the columns coming from the other data frame.

Luke Hayden
  • 692
  • 4
  • 8