-4

We are creating a database of companies, and we would like to have unique company name. However, sometimes user enter them differently, i.e. some day they might enter "xyz, inc" and the other day they might enter it as "xyz inc." though the name are logically same, but not for database comparison purposes.

Is there any way I can find similar name in a MySQL database? We've already considered using LIKE, but decided against it.

halfer
  • 19,824
  • 17
  • 99
  • 186
Sumit Gupta
  • 2,152
  • 4
  • 29
  • 46
  • Did you tried anything? Fulltext search engine, for example – Alma Do Apr 03 '14 at 12:04
  • actually I am just reading about how to do, and have no clue how to do it? I never use fulltext search, can you give me some guide. – Sumit Gupta Apr 03 '14 at 12:05
  • See [this](http://stackoverflow.com/questions/3338889/how-to-find-similar-results-and-sort-by-similarity) answer. – mvw Apr 03 '14 at 12:10
  • 1
    No - because it's only about - reading the manual. That [page](https://dev.mysql.com/doc/refman/5.0/en/fulltext-search.html) has all needed explanations together with good examples. I don't think that copy-pasting manual page to SO answer would have any sense. – Alma Do Apr 03 '14 at 12:10
  • @AlmaDo I really didn't expect a copy paste manual, but You really help me out here. Thanks. – Sumit Gupta Apr 03 '14 at 12:31
  • @mvw Yes, I was reading about such implementation, but they didn't make much sense to me. Sometime for starter such things are little complex to understand. But thanks, full text search did a decent job for my purpose at this moment. – Sumit Gupta Apr 03 '14 at 12:32

3 Answers3

2

A simple way to do this is to split your user input into words, and trim out punctuation:

"xyz, inc." becomes array("xyz", "inc")

Then you can do something like this:

SELECT
    *
FROM
    company
WHERE
    name LIKE '%xyz%'
    OR name LIKE '%inc%';

To improve this, you could run the query once per word, and add weightings to them (you would therefore need a weighting table, and it might say that "inc" is a common word and thus does not count significantly). More weightings and more results would increase the ranking of each result.

Bear in mind though that this strategy may not be optimal from a performance perspective, if you have a large table, since LIKE performs full table scans. If you have 10K rows you'll be fine, but if you have 1M rows, you might not be. As always, you should try it on a representative database to see what performance is like in your case.

This is just a quick-and-simple solution, but you would have better results if you were to install a specific search system, such as ElasticSearch or Lucene.

halfer
  • 19,824
  • 17
  • 99
  • 186
  • Yup your idea is good, and I really thought of it, but I was looking something better than this. My DB is not big today, but who knows :). Thanks for your answer. – Sumit Gupta Apr 03 '14 at 12:35
  • @Sumit: That's fair enough, although in general if some solutions have already been rejected, it is a good idea to note that in the question to start with. The next step is to consider Alma's recommendation of full-text search in the database (as this requires no additional software), and failing that, use a proper search indexer, as per my last paragraph. – halfer Apr 03 '14 at 12:38
  • (I've edited your question to reflect that. Perhaps you could edit it further to explain _why_ a `LIKE` solution is not good enough). – halfer Apr 03 '14 at 12:40
  • Thanks for editing, and thanks for guiding. Yes, as I mention in my comment I am going with his solution. – Sumit Gupta Apr 03 '14 at 12:47
1
Select * From table Where company Like '%xyz%'

Select * From table Where company Like '%xyz' OR company Like '%inc'

Select * From table Where company Like 'xyz%' AND company Like 'inc%'

Use the % sign to search from at first side or place the % sign in last to match from last or put both side to match exact one.

halfer
  • 19,824
  • 17
  • 99
  • 186
0
Select * From table Where company Like '%xyz%'

Select * From table Where company Like '%xyz%' OR company Like '%inc%'

Select * From table Where company Like 'xyz%' AND company Like '%inc%'

Last one would work the best in your case