0

Everything in my code is running my database(Postgresql) is using utf8 encoding, I've checked the php.ini file its encoding is utf8, I tried debugging to see if it was any of the functions I used that were doing this, but nothing everything is running as expected, however after my frontend sends a post request to backend server through curl for some text to be inserted in the database, some characters like 'da' are converted to '?' in postgre and in memcached, I think php is converting them to Latin-1 again after the request reaches the other side for some reason becuase I use utf8_encode before the request and utf8_decode on the other side

 this is the code to send the request
         $pre_opp-> 
    
   Send_Request_To_BackEnd("/Settings",$school_name,$uuid,"Upload_Bio","POST",str_replace(" ","%",utf8_encode($bio)));

this is how the backend system receives this

  $data= str_replace("%"," ",utf8_decode($_POST["Data"])); 
  • 1
    why are you replacing spaces with `%`? it would cause foo da baz to become `foo%da%baz`, then when you decode it will become `fooںz`, research [percent encoding](https://en.wikipedia.org/wiki/Percent-encoding) – Lawrence Cherone May 02 '21 at 20:43
  • I do that because curl gives an error when there are whitespaces in the request fields, but i have also tried it the other way around first replacing whitespace with % then encoding , and on the backend decoding then replacing the % with whitespace again, but this also doesn't work – TheVastNetwork May 02 '21 at 20:47
  • I think i know why, after reading about percent encoding like you suggested, I've found out that % will be encoded regardless of what I do as I am sending a http request, but that dosent explain why it works for most characters but when the bio contains 'da' it malfunctions? – TheVastNetwork May 02 '21 at 20:50

2 Answers2

1

Don't replace " " with "%".

Use urlencode and urldecode instead of utf8_encode and utf8_decode - It will give you a clean alphanumeric representation of any character to easily transport your data.

If everything in your environment defaults to UTF-8, you shouldn't need utf_encode and utf_decode anyways, I guess. But if you still do, you could try combining both like this:

Send_Request_To_BackEnd("/Settings",$school_name,$uuid,"Upload_Bio","POST", urlencode(utf8_encode($bio)));

and

$data= str_replace("%"," ",utf8_decode(urldecode($_POST["Data"]))); 
Paulo Amaral
  • 747
  • 1
  • 5
  • 24
  • Thanks, ill give it a try! – TheVastNetwork May 02 '21 at 20:56
  • Just to be clear, you almost certainly **don't** want utf8_encode and utf8_decode. They are not magic "fix my UTF-8 problems" functions, and calling them both on the same string is an almost certain sign that you're doing something wrong. – IMSoP May 02 '21 at 22:48
0

You say this like it's a mystery:

I think php is converting them to Latin-1 again after the request reaches the other side for some reason

But then you give the reason yourself:

because I use utf8_encode before the request and utf8_decode on the other side

That is exactly what uf8_decode does: it converts UTF-8 to Latin-1.

As the manual explains, this is also where your '?' replacements come from:

This function converts the string string from the UTF-8 encoding to ISO-8859-1. Bytes in the string which are not valid UTF-8, and UTF-8 characters which do not exist in ISO-8859-1 (that is, characters above U+00FF) are replaced with ?.

Since you'd picked the unfortunate replacement of % for space, sequences like "%da" were being interpreted as URL percent escapes, and generating invalid UTF-8 strings. You then asked PHP to convert them to Latin-1, and it couldn't, so it substituted "?".

The simple solution is: don't do that. If your data is already in UTF-8, neither of those functions will do anything but mess it up; if it's not already in UTF-8, then work out what encoding it's in and use iconv or mb_convert_encoding to convert it, once. See also "UTF-8 all the way through".

Since we can't see your Send_Request_To_BackEnd function, it's hard to know why you thought you needed it. If you're constructing a URL with that string, you should use urlencode inside your request sending code; you shouldn't need to decode it the other end, PHP will do that for you.

IMSoP
  • 89,526
  • 13
  • 117
  • 169
  • I did not know exactly what these functions were doing due to lack of php knowledge on my side, but I figured it out thanks to the answer above and your right using urlencode was the way to go, and I've leaned alot of how php encodes data this way, so thank you for the answer – TheVastNetwork May 03 '21 at 03:41