0

The situation: I'm developing a PHP Script which connects to a mssql database via odbc interface. After matching data gathered, these are transmitted through a REST interface to an external server. The script runs on a Windows client. Up to this point everything works fine.

The problem: Before I sent the data, I use the function urlencode() to convert german special chars like ö,ä,ü and ß. For some reason this doesn't work for data which was read out from the database. The following works fine:

echo urlencode("Münzener"); 

Equals: "M%C3%BCnzener" which is correct.

Now I want to encode the result from the database:

$connection_string = "DRIVER={SQL Server};SERVER=".LOCAL_HOST.";DATABASE=".LOCAL_DATABASE; 
$conn = odbc_connect($connection_string, LOCAL_USER, LOCAL_PASSWORD);

$sqlH = odbc_exec($conn, "SELECT field FROM table; ");

while($row = odbc_fetch_array($sqlH)) {

    /* var_dump($row["field"]) equals string(8) "Münzener"*/
    echo urlencode($row["field"]); 

}

Equals: "M%81nzener" which is not correct.

I know there are many topics on stackoverflow which deal with similar problems. As a result of this, i tried the following:

1) detect charset and convert it to UTF-8. Result: mb_detect_encoding() says, i've got ASCII. iconv('ASCII', 'UTF-8', $string); returns PHP Notice:

iconv(): Detected an illegal character in input string

if a add UTF-8//ignore the chars are missing. UTF-8//translit returns different chars. mb_convert_encoding() behaves in the same way.

2) the function utf8_encode() convert the string to "M%C2%81nzener" which is not correct. "%C2%81" looks better, but it is not "%C3%BC" which is correct.

3) I try to pass the charset in odbc_connect() method. Unlikely nothing changed. Last year I have had almost the same problem with a csv file. So I don't think that's the problem.

So my main question is: what is wrong with the encoding in this case? Is there something else in addition of the encoding which causes problems like that in urlencode()?

Tobias Bambullis
  • 736
  • 5
  • 17
  • 45
  • 1) Read `mb_detect_encoding()` and `utf8_encode()` manual pages and you'll realise they don't do what you assume they'd do judging by name 2) I see no reason to suspect anything weird: if `urlencode()` produces different output you're feeding it with different input. I bet you haven't decided your application charset and you're just using defaults everywhere. – Álvaro González Mar 15 '16 at 13:22
  • 1
    `echo bin2hex($row['field'])` – what does that give you? – In a nutshell: your odbc connection does not return the data in the same encoding as you expect, it's returning it in some weird speciality encoding of some ANSI codepage probably. `iconv` from ASCII doesn't work because ASCII doesn't contain the letter ü. Detecting the encoding is unreliable anyway, so don't pay any attention to it. `utf8_encode` only works with Latin-1, which apparently odbc doesn't return. – deceze Mar 15 '16 at 13:24
  • it gives string(32) "54686f6d61732d4d816e747a65722d53" which originally stands for "Thomas-Müntzer-S". Now I realize why that does not work . So my only chance is to change the encoding of the connection? – Tobias Bambullis Mar 15 '16 at 13:38
  • Changing the connection encoding would indeed be the *preferred* method; let the database give you the encoding you need. The alternative is to manually convert the encoding as you've been attempting to, however you'd need to figure out what encoding you need to convert *from*. – deceze Mar 15 '16 at 13:55
  • okay, thank you very much, now i know where to search. Can you copy your comment as the answer? Then I'll mark it as correct... – Tobias Bambullis Mar 15 '16 at 14:05
  • Set up your PHP environment and MySQL properly for UTF-8: http://stackoverflow.com/questions/279170/utf-8-all-the-way-through – Alastair McCormack Mar 15 '16 at 17:38

1 Answers1

1

In a nutshell: your ODBC connection does not return the data in the same encoding as you expect, it's returning it in some weird speciality encoding of some ANSI codepage probably. iconv from ASCII doesn't work because ASCII doesn't contain the letter ü. Detecting the encoding is unreliable anyway, so don't pay any attention to it. utf8_encode only works with Latin-1, which apparently ODBC doesn't return.

Changing the connection encoding would be the preferred method to solve this; let the database give you the encoding you need. The alternative is to manually convert the encoding as you've been attempting to, however you'd need to figure out what encoding you need to convert from.

deceze
  • 510,633
  • 85
  • 743
  • 889