0

I need to match names in two different datasets. These firm names can be partially different and are not unique in both datasets: they may be repeated many times.

Although some of these names may coincide in the two dataset, I want to compare all the similar names and then choose the exact match.

I have tried with pmatch but it returns something strange. The same with agrep that gives me all NA.

Any suggestions?

Macrina
  • 25
  • 8
  • 1
    Welcome to StackOverflow! Please read the info about [how to ask a good question](http://stackoverflow.com/help/how-to-ask) and how to give a [reproducible example](http://stackoverflow.com/questions/5963269). This will make it much easier for others to help you. – Sotos Jun 21 '18 at 09:31

1 Answers1

0

Can you give an example of the data? My best guess without example data is this using grepl():

bad.names <- rep(c('tom','tommy','tommy-boy','john','johnny','johnny-o'),10)

print(table(bad.names))
bad.names
     john    johnny  johnny-o       tom     tommy tommy-boy 
       10        10        10        10        10        10 

new.names <- ifelse(grepl('t', bad.names), 'tommy','johnny')

print(table(new.names))
new.names
johnny  tommy 
    30     30 
bstrain
  • 278
  • 1
  • 9