0

How to match columns in R and extract value. As an example: I want to match on the basis of Name and City columns of dataframe_one with dataframe_two and then return the output with two another column temp and ID. If it matches it should return TRUE and ID too.

My input is:

dataframe_one

Name    City
Sarah   ON
David   BC
John    KN
Diana   AN
Judy    ON

dataframe_two

Name    City    ID
Dave    ON     1092
Diana   AN     2314
Judy    ON     1290
Ari     KN     1450
Shanu   MN     1983

I want the output to be

Name    City    temp    ID
Sarah   ON   FALSE     NA
David   BC   TRUE     1450
John    KN   TRUE     1983
Diana   AN   FALSE    NA
Judy    ON   FALSE    NA
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213

2 Answers2

0

One thing that makes answering questions of this type easier is if you at least put the data frames in R, like so:

df1 <- data.frame(stringsAsFactors=FALSE,
                  Name = c("Sarah", "David", "John", "Diana", "Judy"),
                  City = c("ON", "BC", "KN", "AN", "ON")
)

df2 <- data.frame(stringsAsFactors=FALSE,
                  Name = c("Dave", "Diana", "Judy", "Ari", "Shanyu"),
                  City = c("ON", "AN", "ON", "KN", "MN"),
                  ID = c(1092, 2314, 1290, 1450, 1983)
)

Then search existing Stack Overflow questions that have answered similar questions (e.g. How to join (merge) data frames (inner, outer, left, right)).

Given that neither of your original dfs contain the column "Temp" you would need to create it in the joined (merged) data frame. We'll be able to help a lot more if you at least make a start and then the community will help you troubleshoot.

That being said, I can't for the life of me figure out how you would generate your output df from the inputs.

biomiha
  • 1,358
  • 2
  • 12
  • 25
0

Using biomiha code to generate df1 and df2:

df1 <- data.frame(stringsAsFactors=FALSE,
                  Name = c("Sarah", "David", "John", "Diana", "Judy"),
                  City = c("ON", "BC", "KN", "AN", "ON")
)

df2 <- data.frame(stringsAsFactors=FALSE,
                  Name = c("Dave", "Diana", "Judy", "Ari", "Shanyu"),
                  City = c("ON", "AN", "ON", "KN", "MN"),
                  ID = c(1092, 2314, 1290, 1450, 1983)
)

you may try:

library(dplyr)

df1 %>% 
  left_join(df2, by = c("Name" = "Name", "City" = "City")) %>%
  mutate(temp = !is.na(ID))

gives the output:

   Name City   ID  temp
1 Sarah   ON   NA FALSE
2 David   BC   NA FALSE
3  John   KN   NA FALSE
4 Diana   AN 2314  TRUE
5  Judy   ON 1290  TRUE
tomasy
  • 3
  • 3