This may not be satisfactory, but it could give you a start. Most of the solution here is copied from this answer, with some additional explanation.
The hard part is that at some point you have to make a subjective decision about how close two labels should be in order to count as a cluster, and this will be hard to do for a large data set.
First we compute string distances:
s <- importers$V1 ## for brevity
d <- adist(s) ## compute Levenshtein distances
dimnames(d) <- list(s,s)
Visualize (obviously impractical for a huge set of names ...)
par(mar=c(6,1,1,6))
heatmap(d,Rowv=NA,Colv=NA,margins=c(12,12))

A human can easily tell that there are three clusters here. However, there's no easy cutoff in terms of string distance:
par(las=1)
plot(table(d),xlab="string distance",ylab="frequency")

The within-cluster and between-cluster distance distributions overlap ...
Now we do hierarchical clustering:
hc <- hclust(as.dist(d))
plot(hc)
rect.hclust(hc,k=3) ## select 3 clusters

Once we decide that there are 3 clusters here, the clustering algorithm selects the "correct" elements for each cluster.
Create a new column giving the cluster identity for each row:
importers$code <- cutree(hc,k=3)
As I suggested in comments, it might be better to do this job in OpenRefine: it could be hard to write a reliable, robust, completely automated method for doing this task.
Also: I don't know how badly this will scale to a data set of 10,000 names. Hierarchical clustering is fast, but the distance matrix will be huge (50 million entries), which will take time to compute and space to store. (There are faster ways to compute Levenshtein distance than the built-in adist()
...)
A few suggestions for making the problem more computationally tractable (although it won't be easy in any case):
- you definitely shouldn't try to do the clustering on the full data set. Instead, extract the vector of unique importer names, cluster them, then join (merge) them back with the full data set
- you might be able to do the problem with this (inefficient) batch algorithm:
- split the data into subsets of importers (it will probably work best if you alphabetize the vector first); cluster each of these, reduce them to the consensus names within each cluster
- join the subsets and re-cluster