0

I have a table which contains company names which appear to have been a free text box entry. As such there ends up being lots of companies with 3-5 entries such as A Good Company, A Good Company LLC, AA Good Company etc.

I know if I was looking for one company I could use like (%) to get all the variations, but I would like to insert them into a new company table with just one row for all options so that I can use that as a reference table going forward. Is there a way to do this within SQL, or in an outside application for that matter?

philipxy
  • 14,867
  • 6
  • 39
  • 83
samiracle
  • 25
  • 5
  • 2
    Your best best is to put all the distinct company names into a spreadsheet and normalize them yourself. This is a hard problem -- although there are tools to help, if you have a few hundred or even a thousand or two -- then just manually doing it is probably the fastest way to get it done. – Gordon Linoff Nov 05 '17 at 03:30
  • in future please include the database type (e.g. MySQL, Oracle, Postgres) as a tag. the syntax for string manipulation varies a great deal between dbms vendors. i.e. "sql" isn't sufficient by itself – Paul Maxwell Nov 05 '17 at 03:38
  • Simply by doing a distinct i end up with about 14k rows. When i check to see if they appear at least twice, the number goes down to about 4k. At least 5 times and the number goes down to 2.5k. – samiracle Nov 05 '17 at 03:38
  • I suggest you make a table with the noise words (A, AA, LLC, etc ..) then use them to clean the company names may be in a new column or in runtime and then merge the company names based on the clean values of the company names. – Mohammed Elshennawy Nov 05 '17 at 10:10
  • This is an interesting idea @MohammedElshennawy i think i will give that a shot – samiracle Nov 05 '17 at 15:35
  • Possible duplicate of [Approximate string matching algorithms](https://stackoverflow.com/questions/49263/approximate-string-matching-algorithms) – philipxy Nov 05 '17 at 19:21
  • I agree with Gordon. Do that manually. How else will you find mere typos ('A Good Conpany') or tell similar companies apart ('A Good Company' / 'A God Company' - typo or really two companies)? – Thorsten Kettner Nov 05 '17 at 19:48
  • @samiracle could u edit the question with your desired result ? – Yogesh Sharma Nov 06 '17 at 05:15

0 Answers0