-2

I have two data frames which I need to merge based on candidate and constituency column. Now the problem here is that in both of the data frames there are discrepancies between the spelling of names in both the data frames.

For example in one data frame name is Dr. Ashutosh Singh in other it is Dr Ashutosh Singh. In one data frame name is Dr. Vikash Singh in another its Vikash Singh.

I'm attaching a screenshot of both the data frames. first data frame

Secoond data frame

I have to map first data frame columns CAND_NAME and AC_NAME to the second data frame columns candidate and constituency respectively and have to merge them in one.

I'm sharing the Excel file too and the R code. I have to merge the three sheets into one.
Link for the excel file

R Code

setwd("/home/lenovo/Documents/r_prog/")
library(readxl)

candidate2017=read_excel("LA 2017.xlsx", sheet = 1)
electors2017=read_excel("LA 2017.xlsx", sheet = 2)

ManipurCandidates2017ADR=read_excel("LA 2017.xlsx", sheet = 3)

ManipurCandidate2017=candidate2017[grepl("Manipur", candidate2017$ST_NAME),]
ManipurElectors2017=electors2017[grepl("Manipur", electors2017$ST_NAME),]


ManipurElectors2017 = data.frame(lapply(ManipurElectors2017, function(v) {
  if (is.character(v)) return(toupper(v))
  else return(v)
}))

ManipurCandidates2017ADR = data.frame(lapply(ManipurCandidates2017ADR, function(v) {
  if (is.character(v)) return(toupper(v))
  else return(v)
}))

ManipurCandidate2017 = data.frame(lapply(ManipurCandidate2017, function(v) {
  if (is.character(v)) return(toupper(v))
  else return(v)
}))


View(ManipurCandidate2017)
View(ManipurElectors2017)
View(ManipurCandidates2017ADR)

mergedData = merge(ManipurCandidate2017,ManipurCandidates2017ADR , 
              by.x=c('CAND_NAME'), by.y=c('Candidate'), all = TRUE)

I am new to R please help. Thanks In advance.

Terru_theTerror
  • 4,918
  • 2
  • 20
  • 39
  • 7
    Hello and welcome to StackOverflow. Please take some time to read the help page, especially the sections named ["What topics can I ask about here?"](http://stackoverflow.com/help/on-topic) and ["What types of questions should I avoid asking?"](http://stackoverflow.com/help/dont-ask). And more importantly, please read [the Stack Overflow question checklist](http://meta.stackexchange.com/q/156810/204922). You might also want to learn about [Minimal, Complete, and Verifiable Examples](http://stackoverflow.com/help/mcve). – 000andy8484 Aug 29 '18 at 09:11
  • 6
    It is not advisable to insert external links to data on SO. In fact, I am not taking the risk to click it. Please refer to this [Meta question](https://meta.stackexchange.com/questions/176460/how-to-paste-data-from-r-to-stackoverflow) and check out the dput() function (also in this [R-bloggers post](https://www.r-bloggers.com/converting-an-r-object-to-text-with-dput/)). – 000andy8484 Aug 29 '18 at 09:12
  • 1
    Possible duplicate: https://stackoverflow.com/questions/21165256/r-merge-data-frames-allow-inexact-id-matching-e-g-with-additional-characters – 000andy8484 Aug 29 '18 at 09:15

1 Answers1

0

A possible solution involves using Approximate String Matching (Fuzzy Matching). Check out the agrep() function. You can of course embed agrep() into a merge() call. I cannot write the code since you don't provide a reproducible example.

The call would look something like this:

dat3 <- merge(x=dat1,y = dat2[agrep(dat1$ID1[1],dat2$ID2),],all=TRUE)
000andy8484
  • 563
  • 3
  • 16