I am trying to subset data to create a list of possible duplicates in a new data frame. The problem is that the names are in different format and possible only a small part of the ID may actually match.
I need R to output a list of possible duplicates for me to then check
I've found a few examples for formatting issues or when the it's the first few characters that you are trying to match. I am not sure how to put the codes together and the characters that match may be anywhere in the name.
So far, this seems to get me the closest, but Im still not sure how to apply the code the work for me.
Subset a df using partial match with multiple criteria
This is what my data looks like (but with 1000000s of lines):
Supplier.Name Date.of.Record BMCC.avg
SG & JM Hammond 2018-07-21 292.2381
Mileshan Nominees Pty Ltd 2018-12-21 130.0000
RW & GJ Brown & Sons 2018-02-21 162.8333
BD & BA Smith 2018-02-21 478.0000
In the end,I would like a list of possible duplicates based on partial matches (maybe 4 or 5 characters in a row?)
Right now I can't seem to put together a code at all. Even a few starting point suggesting would be helpful. Thanks!