Merge data in r based on conditions and creating a new column

Question

I'm a beginner with R and I want to merge two datasets together, based on the ID column. If the ID number of dataset 2 exists in dataset 1, then I want the data for this particular row from dataset 2 to be added to the row in dataset 1.

Secondly, if there is a match, I want to put a "1" in a new column called match for that specific row/ match and "0" if there wasn't a match.

Example:

Dataset 1: 

Id     category 
123    3
124    1
125    2

Dataset 2: 

Id     score category 
123    0.24    3
124    0.83    1 
126    0.92    2

Final example with the added column:

Id      score category  match
123    0.24    3         1 
124    0.83    1         1
125    NA      1         0
126    0.92    2         1

I've tried this so far (and some other combinations), but that did not give me good results:

data <- merge(df1, df2, by ="ID" , all.x = TRUE)

I would really appreciate any help!

Reproducible code:

df1 <- data.frame(ID=c("123","124","125"), category=c(3,1,2)
df2 <- data.frame(ID=c("123","124","126"), score=("0.24","0.83","0.92"), category=c("3","1","2")

Do you have a `character` NA? It should not be placed in quotes. The input data showed and the 'df2' in reproducible example are not matching. To create a binary column, just use `%in%` with `as.integer` — akrun, Aug 16 '16 at 12:57
your example data with expected ouput and the reproducible code are not same. You don't have id `126` ? — user5249203, Aug 16 '16 at 13:08
Possible duplicate of [How to join (merge) data frames (inner, outer, left, right)?](http://stackoverflow.com/questions/1299871/how-to-join-merge-data-frames-inner-outer-left-right) — asachet, Aug 16 '16 at 13:20
Thanks for your comments! Sorry for the confusion, I made some mistakes earlier when posting this! I've edited id 126 in dataset 2. The idea is that there will be some specific id's in dataset 1 that are not present in dataset 2 (for instance there is no id 125 in dataset 2). — Veraaa, Aug 16 '16 at 13:28

score 0 · Accepted Answer · answered Aug 16 '16 at 13:09

0

You are almost there. I have slightly modified your example.

df1 <- data.frame(ID = c(123, 124, 125),
                  category=c(3, 1, 2))
df1

df2 <- data.frame(ID = c(123, 124, NA),
                  score = c(0.24, 0.83, 0.35),
                  category = c(3, 1, 2))
df2

df2$match <- 1
df2

data <- merge(df1, df2, by = c("ID", "category") , all.x = TRUE)
data$match[is.na(data$match)] <- 0
data

answered Aug 16 '16 at 13:09

djhurio

5,437
4
27
48

This does not work. It needs to be `all = TRUE` and you should also create the `match` variable as per the OP – Sotos Aug 16 '16 at 13:29

user5249203 · Answer 2 · 2016-08-16T13:59:49.893

If you can provide appropriate result for the data you provided, I can update it if needed. However, you can flag based ID in df2 that matches in df1 by the following steps.

update: Looks like you only want to display the x value and not y value. Updated the output by calling all.x =T

data <- merge(df1, df2, by = c('ID', 'category') ,  all.x = T)
data$match <- ifelse(data$ID %in% df2$ID, 1, 0)

output

    ID category score match
1 123        3  0.24     1
2 124        1  0.83     1
3 125        2  <NA>     0

data

 df1 <- data.frame(ID=c("123","124","125"), category=c(3,1,2))
 df2 <- data.frame(ID=c("123","124","126"), score= c("0.24","0.83","0.92"),
               category=c(3,1,2))

It should be `by = c('ID', 'category')` – Sotos Aug 16 '16 at 13:30 — Sotos, Aug 16 '16 at 13:30

Merge data in r based on conditions and creating a new column

2 Answers2