I have a copy of the Geonames database stored in a MySQL database, and a PHP application that allows users to search the database for their city. It works fine if they type the city name in English, but I want them to be able to search in their native language.
For example, instead of asking a Japanese speaker to search for Tokyo
, they should be able to search for 東京
.
The Geonames database contains an alternatenames
column with, "alternatenames, comma separated, ascii names automatically transliterated, convenience attribute from alternatename table, varchar(10000)
."
For example, the alternatenames
value for the Tokyo
row is Edo,TYO,Tochiu,Tocio,Tokija,Tokijas,Tokio,Tokió,Tokjo,Tokyo,Toquio,Toquio - dong jing,Toquio - æ±äº¬,Tòquio,Tókýó,Tóquio,TÅkyÅ,dokyo,dong jing,dong jing dou,tokeiyw,tokkiyo,tokyo,twkyw,twqyw,Τόκιο,Токио,Токё,Токіо,ÕÕ¸Õ¯Õ«Õ¸,טוקיו,توكيو,توکیو,طوكيو,ܛܘܟÜܘ,ܜܘܟÜܘ,टोकà¥à¤¯à¥‹,டோகà¯à®•à®¿à®¯à¯‹,โตเà¸à¸µà¸¢à¸§,ტáƒáƒ™áƒ˜áƒ,东京,æ±äº¬,æ±äº¬éƒ½,ë„ì¿„
.
Those values don't contain 東京
exactly, but I'm guessing that they contain a form of it that has been encoded or converted in some way. So, I assuming that if I perform the same encoding/conversion on my search string, then I'll be able to match the row. For example:
mysql_query( sprintf( "
SELECT * FROM geoname
WHERE
MATCH( name, asciiname, alternatenames )
AGAINST ( %s )
LIMIT 1",
iconv( 'UTF-8', 'ASCII', '東京' )
) );
The problem is that I don't know what that conversion would be. I've tried lots of combinations of iconv()
, mb_convert_string()
, etc, but with no luck.
The MySQL table looks like this:
CREATE TABLE `geoname` (
`geonameid` int(11) NOT NULL DEFAULT '0',
`name` varchar(200) DEFAULT NULL,
`asciiname` varchar(200) DEFAULT NULL,
`alternatenames` mediumtext,
`latitude` decimal(10,7) DEFAULT NULL,
`longitude` decimal(10,7) DEFAULT NULL,
`fclass` char(1) DEFAULT NULL,
`fcode` varchar(10) DEFAULT NULL,
`country` varchar(2) DEFAULT NULL,
`cc2` varchar(60) DEFAULT NULL,
`admin1` varchar(20) DEFAULT NULL,
`admin2` varchar(80) DEFAULT NULL,
`admin3` varchar(20) DEFAULT NULL,
`admin4` varchar(20) DEFAULT NULL,
`population` int(11) DEFAULT NULL,
`elevation` int(11) DEFAULT NULL,
`gtopo30` int(11) DEFAULT NULL,
`timezone` varchar(40) DEFAULT NULL,
`moddate` date DEFAULT NULL,
PRIMARY KEY (`geonameid`),
KEY `timezone` (`timezone`),
FULLTEXT KEY `namesearch` (`name`,`asciiname`,`alternatenames`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8mb4
Can anyone point me in the right direction?