I have a column of company names and I would like to count how many different companies in that column. In this column, some identical companies have slight difference in their names, for example, these companies should be counted only once.
ASAHI INTECC CO., LTD.
Asahi Intecc USA Inc
ASAHI INTECC USA, INC
I want the codes that could work in general, which could precisely count the numbers of companies without counting the duplicates with slight difference. For example, this reproducible data should return a value of 6
company <- read.table(text = "
CompanyName
'MERCK SHARP & DOHME CORPORATION'
'GILEAD SCIENCES INC'
'BOEHRINGER INGELHEIM PHARMACEUTICALS, INC.'
'ABBVIE, INC.'
'JANSSEN SCIENTIFIC AFFAIRS, LLC'
'BOEHRINGER INGELHEIM PHARMA GMBH & CO.KG'
'ASAHI INTECC CO., LTD.'
'Asahi Intecc USA Inc'
", header = TRUE, stringsAsFactors = FALSE)
I looked at How can I match fuzzy match strings from two datasets? But I still do not have an idea how to construct the codes. Hope for any advice