0

I'm a beginner with R and I want to merge two datasets together, based on the ID column. If the ID number of dataset 2 exists in dataset 1, then I want the data for this particular row from dataset 2 to be added to the row in dataset 1.

Secondly, if there is a match, I want to put a "1" in a new column called match for that specific row/ match and "0" if there wasn't a match.

Example:

Dataset 1: 

Id     category 
123    3
124    1
125    2

Dataset 2: 

Id     score category 
123    0.24    3
124    0.83    1 
126    0.92    2  

Final example with the added column:

Id      score category  match
123    0.24    3         1 
124    0.83    1         1
125    NA      1         0
126    0.92    2         1

I've tried this so far (and some other combinations), but that did not give me good results:

data <- merge(df1, df2, by ="ID" , all.x = TRUE)

I would really appreciate any help!

Reproducible code:

df1 <- data.frame(ID=c("123","124","125"), category=c(3,1,2)
df2 <- data.frame(ID=c("123","124","126"), score=("0.24","0.83","0.92"), category=c("3","1","2")
Veraaa
  • 301
  • 2
  • 6
  • 15
  • Do you have a `character` NA? It should not be placed in quotes. The input data showed and the 'df2' in reproducible example are not matching. To create a binary column, just use `%in%` with `as.integer` – akrun Aug 16 '16 at 12:57
  • your example data with expected ouput and the reproducible code are not same. You don't have id `126` ? – user5249203 Aug 16 '16 at 13:08
  • 1
    Possible duplicate of [How to join (merge) data frames (inner, outer, left, right)?](http://stackoverflow.com/questions/1299871/how-to-join-merge-data-frames-inner-outer-left-right) – asachet Aug 16 '16 at 13:20
  • Thanks for your comments! Sorry for the confusion, I made some mistakes earlier when posting this! I've edited id 126 in dataset 2. The idea is that there will be some specific id's in dataset 1 that are not present in dataset 2 (for instance there is no id 125 in dataset 2). – Veraaa Aug 16 '16 at 13:28

2 Answers2

0

You are almost there. I have slightly modified your example.

df1 <- data.frame(ID = c(123, 124, 125),
                  category=c(3, 1, 2))
df1

df2 <- data.frame(ID = c(123, 124, NA),
                  score = c(0.24, 0.83, 0.35),
                  category = c(3, 1, 2))
df2

df2$match <- 1
df2

data <- merge(df1, df2, by = c("ID", "category") , all.x = TRUE)
data$match[is.na(data$match)] <- 0
data
djhurio
  • 5,437
  • 4
  • 27
  • 48
  • This does not work. It needs to be `all = TRUE` and you should also create the `match` variable as per the OP – Sotos Aug 16 '16 at 13:29
0

If you can provide appropriate result for the data you provided, I can update it if needed. However, you can flag based ID in df2 that matches in df1 by the following steps.

update: Looks like you only want to display the x value and not y value. Updated the output by calling all.x =T

data <- merge(df1, df2, by = c('ID', 'category') ,  all.x = T)
data$match <- ifelse(data$ID %in% df2$ID, 1, 0)

output

    ID category score match
1 123        3  0.24     1
2 124        1  0.83     1
3 125        2  <NA>     0

data

 df1 <- data.frame(ID=c("123","124","125"), category=c(3,1,2))
 df2 <- data.frame(ID=c("123","124","126"), score= c("0.24","0.83","0.92"),
               category=c(3,1,2))
user5249203
  • 4,436
  • 1
  • 19
  • 45