0

These don't seem to index, even when I explicitly add them to my charset_table:

charset_table=...  U+20AC->U+20AC, U+00A3->U+00A3

I even tried mapping them to the dollar sign

U+0024->U+0024, U+20AC->U+0024, U+00A3->U+0024

Yet in each case they are unrecognized in other words MATCH('£1000') will not find 'cost is £1000' and if I try to map to $ as per the second example then MATCH('$1000)` will not either.

If I do a MySQL Search however where field like '%£%' I do get records leading me to believe the MySQL is encoding UTF-8 correctly. Meaning the Pound Sign and Euro characters are being stored correctly in MySQL but the Sphinx index is not recognizing them regardless, even after I explicitly add their Unicode characters to my charset_table.

Relevant portion of config:

`min_stemming_len = 1
stopword_step = 0
html_strip  = 1
min_word_len = 1
min_infix_len = 0
index_zones = title,description
charset_type = utf8mb4_unicode_ci
charset_table = 0..9, A..Z->a..z, _, a..z, U+0026->U+0026, U+0027->U+0027, U+002E->U+002E, U+002D->U+002D, U+2014->U+002D#, U+2019->U+0027, U+0024->U+0024, U+20AC->U+0024, U+00A3->U+0024

Confirmed that the table/column is using utf8mb4_unicode_ci

Confirmed I can do a mysql search on Euro: Where Title like '%€%'

Confirmed I cannot find same record with SphinxQL: where MATCH('€')

user3649739
  • 1,829
  • 2
  • 18
  • 28

1 Answers1

0

There are a three things you should check:

First, look at This Question to check your MySQL char encoding;

Secondly, look in your Sphinx config to check charset_type matches it.

Lastly, remember, after any changes to charset_type or charset_table you need to rebuild indexes.

If none of the above helps, you could post your Sphinx Config here, which might give further clues as to the problem.

Community
  • 1
  • 1
Josh Greifer
  • 3,151
  • 24
  • 25
  • Greller Ok I did check char encoding and in fact for the table/column in question changed from `utf8mb4_general_ci` to `utf8mb4_unicode_ci` based on comments here http://stackoverflow.com/questions/766809/whats-the-difference-between-utf8-general-ci-and-utf8-unicode-ci that "general fails fails to implement all of the Unicode sorting rules.. such as when using particular languages or characters." I also then, in addition to specifying the charset_table for the Euro also specified chaset_type. No luck. Yet MysQl search for Euro Symbol still works. – user3649739 Apr 21 '17 at 20:14