0

I'm working on a script that I've run into a bit of an issue with.

The script expects strings to be passed to it in byte data. As an example, I have the string:

61,68,71,61,68,101,118,105,101,116,104

which turns out to be =DG=Devieth. The following code takes that line and translates it successfully:

$sv_reportee = implode(array_map('chr', explode(',', $_GET['defendant'])));

Now, let's say I change that string to contain 171 («) and 187 (»). The script spits out no warnings, no notices, or anything... it just refuses to do any more work in terms of working with variables. It'll run the other functions through just fine, but running print($sv_reportee) results in absolutely nothing coming for that variable at all.

This was my reference for the above line of code: PHP Get String Text From Bytes

Now, from what I understand, chr() should be able to handle from 0-255 on the ASCII table. Right? Or is there another way that I should be/could be doing this that doesn't involve the above line of code?

Worth mentioning, due to a limitation in another aspect of the application, the string must be sent in byte form. There's unfortunately no other way around this - we've exhausted all of our other possible options.

Community
  • 1
  • 1
mywarthog
  • 3
  • 4
  • Hello, ASCII is 0-128 anything more needs encoding I believe. There is a thing like "extended ASCII" [here](http://www.theasciicode.com.ar/extended-ascii-code/one-half-ascii-code-171.html) but the ints do not match with yours (171 == 1/2 rather than quote). Now your example matches latin1 - https://en.wikipedia.org/wiki/ISO/IEC_8859-1. so I 'd say try [this](http://stackoverflow.com/a/16165685/3727050) to re-encode to UTF8... – urban Nov 01 '16 at 09:18
  • @urban Thanks for the link - surrounding the string with utf8_encode() worked, as well as setting the headers type. – mywarthog Nov 01 '16 at 21:05

1 Answers1

1

What chr does it to translate an integer to a raw byte, meaning:

  chr(171)
→ "\xAB"
= 1010 1011

That is all. 171 does not equal the character "«". All it equals is the byte 0xAB. How that is translated into characters is a different story and depends on what encoding that byte is interpreted as. 0xAB happens to equal "«" in the ISO-8859-1 encoding. Assuming you're testing this in the browser, this'll output "«":

header('Content-Type: text/html; charset=iso-8859-1');
echo chr(171);

Here you're telling the browser explicitly what encoding to interpret the data as. If "nothing" shows up, likely whatever is interpreting the bytes as characters is using an encoding where 0xAB doesn't mean anything. If you don't want to use ISO-8859-1 (and typically these days you shouldn't), you'll need to convert the data to another encoding:

header('Content-Type: text/html; charset=utf-8');
echo iconv('ISO-8859-1', 'UTF-8', chr(171));
deceze
  • 510,633
  • 85
  • 743
  • 889
  • Yes, perfect. The header() along with surrounding the string with utf8_encode() did the trick perfectly. Thanks! – mywarthog Nov 01 '16 at 21:04