1

I have the following code in php

$test = "\151\163\142\156";
echo utf8_decode($test);
var_dump($test);

and i get the following result:

isbn
string(4) "isbn"

I get some text from a txt file that has the \151\163\142\156 text

$all_text = file_get_contents('test.txt');
var_dump($all_text);

result:

string(16) "\151\163\142\156"

I have the following questions:

  1. how can i utf8 decode the second text so i get the isbn result?

  2. how can i encode the isbn to get \151\163\142\156 ?

EDIT

(from comments)

I tried everything with iconv and encode but nothing worked. The text from the .txt file is string(16) and not string(4) so i can encode it. The txt file is saved from sublime with Western (ISO 8859-1) encoding

Martin
  • 22,212
  • 11
  • 70
  • 132
Ind F. Ashiku
  • 169
  • 2
  • 12
  • 3
    Seriously? `utf8_decode($all_text);`? `utf8_encode('WHAT YOU NEED')`? – u_mulder Jun 21 '16 at 20:13
  • that does not work. I get the same text and not the encoded one. as you can see the first one $test is string(4) and the second is string(16) – Ind F. Ashiku Jun 21 '16 at 20:18
  • Do you decode second string? Show output of __DECODED__ `$all_text` – u_mulder Jun 21 '16 at 20:19
  • `$all_text_utf8_decoded = utf8_decode(file_get_contents('test.txt'));` and http://php.net/manual/de/function.utf8-encode.php – Jens A. Koch Jun 21 '16 at 20:20
  • does that matter, as `file_get_contents` is a reference point for the file rather than a container for the data itself? – Martin Jun 21 '16 at 21:20
  • I'm writing a big update to my answer so will add that in as it may be useful although I don't personally think it would cause an encoding issue, but I'm far from certain. `:-)` – Martin Jun 21 '16 at 21:24
  • haha, I have credited it to you :-) I'm not in a position to test the quotes just now, although reading the SO link about double and single quotes doesn't mention anything about encodings...... – Martin Jun 21 '16 at 21:27
  • Just edited my answer to show how to convert chars to codes. Enjoy! (upvote if useful ^_º). – Jose Manuel Abarca Rodríguez Jun 22 '16 at 14:14

2 Answers2

1

Try using stripcslashes :

<?php

$test = "\151\163\142\156";
echo utf8_decode( $test );                         // "isbn"
var_dump( $test );

echo "<br/><br/><br/>";

$all_text = file_get_contents( "test.txt" );
echo utf8_decode( $all_text ) .                    // "\151\163\142\156"
     "<br/>" .
     utf8_decode( stripcslashes( $all_text ) );    // "isbn"
var_dump( stripcslashes( $all_text ) );

?>

Tested with this file :

This is some text :

\151\163\142\156

And this is more text!!!

Next is how to convert chars to codes :

<?php
$test = "isbn";
$coded = "";
for ( $i = 0; $i < strlen( $test ); $i++ ) // PROCESS EACH CHAR IN STRING.
  $coded .= "\\" . decoct( ord( $test[ $i ] ) ); // CHAR CODE TO OCTAL.

echo $coded .                           // "\151\163\142\156"
     "<br/>" .
     stripcslashes( $coded );           // "isbn".
?>

Let's make it more general with a function that we can call anywhere :

<?php
function code_string ( $s )
{ $coded = "";
  for ( $i = 0; $i < strlen( $s ); $i++ )
    $coded .= "\\" . decoct( ord( $s[ $i ] ) );
  return $coded;
}

$x = code_string( "isbn" );
echo $x .                           // "\151\163\142\156"
     "<br/>" .
     stripcslashes( $x );           // "isbn".
?>
1

This has absolutely nothing to do with UTF-8 encoding. Forget about that part entirely. utf8_decode doesn't do anything in your code. iconv is entirely unrelated.

It has to do with PHP string literal interpretation. The \... in "\151\163\142\156" is a special PHP string literal escape sequence:

\[0-7]{1,3}
the sequence of characters matching the regular expression is a character in octal notation, which silently overflows to fit in a byte (e.g. "\400" === "\000")

http://php.net/manual/en/language.types.string.php#language.types.string.syntax.double

Which very easily explains why it works when written in a PHP string literal, and doesn't work when reading from an outside source (because the external text read through file_get_contents is not being interpreted as PHP code). Simply do echo "\151\163\142\156" and you'll see "isbn" without any other conversions necessary.

To manually convert the individual escape sequences in the string \151\163\142\156 to their character equivalents (really: their byte equivalents):

$string = '\151\163\142\156';  // note: single quotes cause no iterpretation
echo preg_replace_callback('/\\\\([0-7]{1,3})/', function ($m) {
    return chr(octdec($m[1]));
}, $string)
// isbn

stripcslashes happens to include this functionality, but it also does a whole lot of other things which may be undesired.

The other way around:

$string = 'isbn';
preg_replace_callback('/./', function ($m) {
    return '\\' . decoct(ord($m[0]));
}, $string)
// \151\163\142\156
Community
  • 1
  • 1
deceze
  • 510,633
  • 85
  • 743
  • 889