3

Possible Duplicate of: How to use prefix wildcards like '*abc' with match-against

But i cannot find my answer so asked this question sorry for the duplicate.

I am performing a query in MySQL

Records are :

  1. I am john doe.
  2. John doe is a man.
  3. John last name is doe

like %john d% will match first two result because they are in same order and wilds will match it any where in the record But in a large data set this has killed the performance

So i googled and found MATCH AGAINST IN BOOLEAN MODE as an alternative.Now, my search term is: john d It tried

AGAINST('"john d"')
AGAINST('john d*')
AGAINST('+john +d') etc

I only want to get results that are in same order. (e.g 1. i am john doe. 2. john doe is a man) for this search term john d but i cannot achieve it. like %john d% gives my desired but it kills the performance. how can i get my desired result in MySQL with fast performance.

In Possible Duplicate of: How to use prefix wildcards like '*abc' with match-against

@GolezTrol gave a solution to create a separate column in which he reverse the strings:

user_login user_login_rev
xyzabc     cbazyx

Then, instead of looking for '%john d', we can look for 'john d%' which is much faster if the column is indexed.

But

@PeerBr Bewares that inverting strings will not help you if you want to find stuff from the middle of the string. You won't find "Jimmy Blue Jones" by typing 'Blue%' using normal indices nor by inverting 'Blue%' using inverted inices.

Thanks

Community
  • 1
  • 1
shery089
  • 702
  • 11
  • 19

1 Answers1

1

For Solr this should work nicely with a field with a KeywordTokenizer and a ReverseWildcardFilter:

<fieldType name="c_string" class="solr.TextField">
 <analyzer type="index">
  <tokenizer class="solr.KeywordTokenizerFactory"/>
  <filter class="solr.ASCIIFoldingFilterFactory"/>
  <filter class="solr.LowerCaseFilterFactory" />
  <filter class="solr.ReversedWildcardFilterFactory" />
 </analyzer>
 <analyzer type="query">
  <tokenizer class="solr.KeywordTokenizerFactory"/>
  <filter class="solr.ASCIIFoldingFilterFactory"/>
  <filter class="solr.LowerCaseFilterFactory" />
  <filter class="solr.ReversedWildcardFilterFactory" />
 </analyzer>
</fieldType>

Depending on your use case you can drop the ASCIIFoldingFilterFactory. The LowerCaseFilterFactory ensures that the string is lowercase properly, while the KeywordTokenizer keeps the whole string as a single token - so that you don't match case #3 in your examples.

The ReversedWildcardFilter stores the tokens in reversed order as well, and when it detects a prefix wildcard, it appends a reversed token prefix search as well, so you still get good performance from the indexed tokens.

MatsLindh
  • 49,529
  • 4
  • 53
  • 84