I'm creating a website design in PHP with aquarium ads - www.akwa-market.pl I wanted the site to be international. I have created a database and tables, one of them is with classifieds entries. I also have a built-in word search. Suppose I typed in the polish word turtle - żółw I run the command:
SELECT FROM * table_name WHERE item REGEXP '[[:<:]]{$keyword}[[:>:]]';
or:
SELECT FROM * table_name WHERE item REGEXP '{$keyword}';
and find a record in the database that contains that word. The problem occurs when I enter a word in Russian, for example Привет The search engine does not show a record with the given Russian word.
After connecting to the database, I call.
mysqli_set_charset($db, "utf8mb4");
mysqli_query($db, "SET NAMES utf8mb4");
mysqli_query($db, "SET CHARACTER SET utf8mb4");
I tried to set in db the string comparison method to utf8mb4_bin
or to utf8mb4_unicode_ci
but this does not give successful results. It is still possible to search only words containing Latin characters, including diacritics of European languages.
I don't know how to set the query to be able to search for words in other languages, e.g. Russian, Chinese or Arabic.
I mean to use it together with REGEXP
, preferably to search for several words at the same time, separated by the 'I' sign, i.e. OR, where, for example, the $keyword variable takes the value:
$keyword = 'Привет | Андрей';
P.S. Additionally, I don't know how to solve the case when someone types - polish word tortoise - zółw with z letter without dot, so that the word tortoise - żółw can be searched. Or even written without pressing ALT, e.g. zolw
Can anyone help? The site is almost functioning normally there is no errors given by the server, but the search engine is not working correctly and I don't really know how to solve it. Sorry, but English is my second language.
Edit: Output show create table as user @ysth asked.
CREATE TABLE `items` (
`index` int(11) NOT NULL AUTO_INCREMENT,
`id` varchar(11) COLLATE utf8_unicode_ci NOT NULL,
`user_name` varchar(30) COLLATE utf8_unicode_ci DEFAULT NULL,
`user_id` varchar(30) COLLATE utf8_unicode_ci DEFAULT NULL,
`price` decimal(10,2) DEFAULT NULL,
`main_category` varchar(50) COLLATE utf8_unicode_ci NOT NULL,
`category` varchar(50) COLLATE utf8_unicode_ci NOT NULL,
`sub_category` varchar(50) COLLATE utf8_unicode_ci NOT NULL,
`description` text CHARACTER SET utf8mb4 COLLATE utf8mb4_bin DEFAULT NULL,
`country` varchar(50) COLLATE utf8_unicode_ci NOT NULL,
`voivodeship` varchar(50) COLLATE utf8_unicode_ci NOT NULL,
`town` varchar(50) COLLATE utf8_unicode_ci NOT NULL,
`latitude` decimal(11,8) DEFAULT NULL,
`longitude` decimal(11,8) DEFAULT NULL,
`email` varchar(50) COLLATE utf8_unicode_ci NOT NULL,
`entry_start` timestamp NULL DEFAULT NULL,
`entry_expire` timestamp NULL DEFAULT NULL,
`adv_type` varchar(30) COLLATE utf8_unicode_ci NOT NULL,
`adv_status` varchar(20) COLLATE utf8_unicode_ci DEFAULT NULL,
`adv_views` decimal(9,0) DEFAULT 0,
`users_report` text COLLATE utf8_unicode_ci DEFAULT NULL,
PRIMARY KEY (`index`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
as I see now the table description column where the Russian text was saved has different collation utf8mb4_bin, if that is the case of problem sorry for disturbing, I have to check that right now and let You know.