11

I have a MySQL database with words containing accents in Spanish (áéíóú). I'd like to know if there's any way to do a diacritic insensitive search. For instance, if I search for "lapiz" (without accent), i'd like to get results containing the word "lápiz" from my db. The way I'm currently doing the query is as follows:

$result = mysql_query("SELECT * FROM $lookuptable WHERE disabled = '0' AND name LIKE '%$q%' OR productCode LIKE '%$q%' LIMIT $sugglimit");

This is for an online store, so I don't know what people will be searching for... "lapiz" is just and example.

alt text http://www.freeimagehosting.net/uploads/0e7c2ae7d5.png

Thanks!

Hectorcr7
  • 113
  • 1
  • 1
  • 5

5 Answers5

22

Character sets & collations, not my favorites, but they DO work:

mysql> SET NAMES latin1;
mysql> SELECT 'lápiz' LIKE 'lapiz';
+-----------------------+
| 'lápiz' LIKE 'lapiz' |
+-----------------------+
|                     0 | 
+-----------------------+
1 row in set (0.01 sec)

mysql> SET NAMES utf8;
mysql> SELECT 'lápiz' LIKE 'lapiz';
+-----------------------+
| 'lápiz' LIKE 'lapiz' |
+-----------------------+
|                     1 | 
+-----------------------+


mysql> SET NAMES latin1;
mysql> SELECT _utf8'lápiz' LIKE _utf8'lapiz' ;
+---------------------------------+
| _utf8'lápiz' LIKE _utf8'lapiz' |
+---------------------------------+
|                               1 | 
+---------------------------------+

A nice chapter to read in the manual:Character Set Support

Wrikken
  • 69,272
  • 8
  • 97
  • 136
8

If you set the table's charset to UTF-8 and the collation to utf8_*_ci (_ci means "case insensitive) MySQL will perform case and accent-insensitive searches by default

Read more about charsets and collations here:
http://dev.mysql.com/doc/refman/5.1/en/charset-charsets.html

I tested it and

"lapiz" matches: "lápiz," "lapíz," and "lapiz"
"nino" matches: "niño," "ninó," and "nino"

You can set up the collation of your table upon creation:

CREATE TABLE table ( ... )
CHARACTER SET uft8 COLLATE utf8_general_ci;

Or you can ALTER it if it already exists.For more info, read the manual (link above).
If you are using phpMyAdmin, you can select the collation when you create your table.

NullUserException
  • 83,810
  • 28
  • 209
  • 234
  • How do I set the table? I'm not very familiar with mySQL. Thanks! – Hectorcr7 Jul 21 '10 at 23:23
  • I've modified my table (CubeCart_inventory) to uft_general8_ci and still doesn't work well. I modified my table with phpmyadmin (refer to image in original post). My webpage is www.carrodelectronica.com – Hectorcr7 Jul 22 '10 at 01:51
6

You can force the column name to convert as UTF8. I haven't tried is for Spanish but rather for Romanian characters with accents, but I assume it's the same thing.

The query I use is:

SELECT CONVERT('gîgă' USING utf8) LIKE '%giga%'

Or in the more likely case of looking up a column in a table, you can use:

SELECT CONVERT(column_name USING utf8) FROM table_name LIKE '%giga%'
Radu
  • 133
  • 1
  • 7
2

Just in case someone else stumbles upon this issue, I have found a way that solves the problem, at least for me, without messing with character sets and collations inside MySQL queries.

I am using PHP to insert and retrieve records from the database. Even though my Database, tables and columns are utf8, as well as the encoding of the PHP files, the truth is that the encoding used in the connection between PHP and MySQL is being made using latin1. I managed to find this using $mysqli->character_set_name(); where $mysqli is your object.

For the searches to start working as expected, returning accent insensitive and case insentive records for characters with accents or not, I have to explicitly set the character set of the connection.

To do this, you just have to do the following: $mysqli->set_charset('utf8'); where $mysqli is your mysqli object. If you have a database management class that wraps your database functionality, this is easy to apply to a complete app. If not, you have to set this explicitly everywhere you open a connection.

I hope this helps someone out, as I was already freaking out about this!

Miguel Mesquita Alfaiate
  • 2,851
  • 5
  • 30
  • 56
2

Store a second version of the string that has been stripped of diacritics?

Amber
  • 507,862
  • 82
  • 626
  • 550
  • I don't know if I explained myself. Let's say $q is equal to "lapiz". I need to get back results from db that contain "lápiz". I think you are talking when $q is equal to "lápiz" and then you strip the diacritics, which is the opposite. Thanks! – Hectorcr7 Jul 21 '10 at 22:55
  • @Hector What Amber meant is store a copy of the string without the accents in MySQL. Then you can always perform a search on that column. That is not needed though, answer forthcoming. – NullUserException Jul 21 '10 at 23:00
  • I heard there's a way to set mySQL to UTF8 and that way you can do diacritic insensitive comparison. Do you know anything about it? – Hectorcr7 Jul 21 '10 at 23:08