2

I'm using jQuery's autocomplete function on my Norwegian site. When typing in the Norwegian characters æ, ø and å, the autocomplete function suggests words with the respective character, but not words starting with the respective character. It seems like I've to manage to character encode Norwegian characters in the middle of the words, but not characters starting with it.

I'm using a PHP script with my own function for encoding Norwegian characters to UTF-8 and generating the autocomplete list.

This is really frustrating!

Code:

PHP code:

$q = strtolower($_REQUEST["q"]);
if (!$q) return;

function rewrite($string){

 $to = array('%E6','%F8','%E5','%F6','%EB','%E4','%C6','%D8','%C5','%C4','%D6','%CB', '%FC', '+', ' ');
 $from = array('æ', 'ø', 'å', 'ä', 'ö', 'ë', 'æ', 'ø', 'å', 'ä', 'ö', 'ë', '-', '-');

 $string = str_replace($from, $to, $string);

 return $string;
}

$items is an array containg suggestion-words.

foreach ($items as $key=>$value) {
  if (strpos(strtolower(rewrite($key)), $q) !== false) {
    echo utf8_encode($key)."\n";
  }
}

jQuery code:

$(document).ready(function(){
$("#autocomplete").autocomplete("/search_words.php", {
        position: 'after',
        selectFirst: false,
        minChars: 3,
        width: 240,
        cacheLength: 100,
        delay: 0
        }
    )
}
);
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
jorgen
  • 1,217
  • 6
  • 22
  • 41
  • Are you getting valid returns when using a non-encoded character in the first position? And how are you building the return list - might help to see that code\query. – menkes Dec 29 '09 at 16:16
  • 4
    Why are you using custom encoding script, why not post data in utf-8 with ajax? – dfilkovi Dec 29 '09 at 16:35

4 Answers4

7

The bug (I think):

  • Strtolower() will not lowercase special characters.
  • Therefore, you are not converting capital special characters in your re-write function (Ä Æ Ø Å etc.)

if I understand the code correctly, a query for Øygarden(Notice the capital Ø) would leave the first character in its original form Ø, but you are querying against the urlencode()d form which should be %C3%98

You should use mb_convert_case() specifying UTF-8 as the encoding.

Let me know whether this solves it.

General re-writing suggestions

Your code could be replaced 100% using standard PHP functions, which can handle all Unicode characters instead of just those you specify, thus being less prone to bugs. I think the functionality of your custom rewrite() function could be replaced by

you would then get proper UTF-8 encoded data that you don't need to utf8_encode() any more. It could be possible to get a cleaner approach that way that works for all characters. It could also be that that already sorts whatever bug there is (if the bug is in your code).

Pekka
  • 442,112
  • 142
  • 972
  • 1,088
0

I'm using a similar configuration but with Danish characters (æ, ø and å) and I do not have a problem with any characters. Are you sure you are encoding all characters correctly?

My response contains a | delimited list of values. All values are UTF-8 encoded (that's how they are stored in the database), and I set the content type to text/plain; charset=utf-8 using php's header function. The last bit is not needed for it to work though.

  • Frank
fmk_wa
  • 21
  • 2
0

Thank you for all answers and help. I certainly learned some new things about PHP and encoding :)

But the solution that worked for me was this:

I found out that the jQuery autocomplete function actually UTF-8 encodes and lowercase special character before sending it to the PHP function. So when I write out the arrays of suggest content, I used my rewrite()-function to encode the special characters. So in my compare function I only had to lowercase everything.

Now it works great!

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
jorgen
  • 1,217
  • 6
  • 22
  • 41
0

I had similar problem. solution in my case was urldecode() php function to convert string back to it's original and than send query to db.

m1k3y3
  • 2,762
  • 8
  • 39
  • 68