0

I'm trying to stringify output from openssl_public_encrypt and other openssl functions i php, and the output don't seem to be utf8 encoded. Here is a sample code that generate the error that is my problem in a nutshell.

<?php
  $pubkey=<<<EOD
  -----BEGIN PUBLIC KEY-----
  MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEAueWffJhr4j+PZhf4QlFF
  1HEmcu9d93YYBIQdZBZLWx4uqxsZ6Q3FaBVMkHh0h+sDHx1je2fQprTEMjWGSIu0
  HlXRZqPLkVUCpQg2j1oQk2BbZExS6kyziVa1G9ai094WqMz3MjyimOvJxuCAsb+i
  rQ/HaC2+vBAdm8wjLYEkqe/q7Q6Tnf+U6bpPYASXTz0WlLJj/G2LLTpEYzF3IgTB
  tRsTI6hwpmpHzpKUucEvliEesEPMAs3xp4AaKBdqKQoGFsiA2p1jxJIRUXC/ur7f
  2ZgWI59AtemVd+FRZfUapfe5uDD3M5cJy/6Uh9Yg+7vMzuCzi/yBDDFwyy4hD2RJ
  YwIDAQAB
  -----END PUBLIC KEY-----
  EOD;
  $jsontest= new \stdClass();
  $data="Testing some text ÆØåæøåéè";
  openssl_public_encrypt($data,$encrypted,$pubkey,OPENSSL_PKCS1_OAEP_PADDING);
  //Next line outputs encoding UTF8 sometimes but not consequently
  echo "\n\ndata1: ".mb_detect_encoding($encrypted)."\n";
  $jsontest->data1=$encrypted;
  $data="Testing some other text ÆØåæøåéè";
  openssl_public_encrypt($data,$encrypted,$pubkey,OPENSSL_PKCS1_OAEP_PADDING);
  //Next line outputs encoding UTF8 sometimes but not consequently
  echo "\n\ndata1: ".mb_detect_encoding($encrypted)."\n";
  $jsontest->data2=$encrypted;
  header('Content-Type: application/json; charset=UTF-8');
  //print_r($jsontest);
  $json=null;
  try {
      $json = json_encode($jsontest, JSON_THROW_ON_ERROR);
  } catch (JsonException $e) {
      echo 'Error:'.$e;
  }
  if($json)echo "JSON output:\n$json";
?>

Expected output would be a stringified json object with utf8 encoded property values. Instead i get this error message:

"Error:JsonException: Malformed UTF-8 characters, possibly incorrectly encoded in 'the php file':24"

When i run the above code snippet, the 'mb_detect_encoding' lines output 'UTF-8' sometimes, but not always.

There seems to be a problem in openssl_public_encrypt, where the output is not conform to utf-8 encoding.

A very strange behavior detected: Probably mb_detect_encoding does not detect correct, because the json_encode function fails every time. and probably openssl_public_encrypt is to blame for this behavior.

Anyways i can't stringify the supposedly UTF-8 encoded output from openssl_public_encrypt. I use base64 encoding of encrypted data for now as a solution, but the data overhead is around the double of original data.

I use openssl in php to encrypt/decrypt with rsa, ecdh and aes, in conjuction with javascript webCrypto.

Can anybody help me solve this problem, as i am probably not the only one who has this problem.

Edit:

Got it wrong! The function json_encode in php is the showstopper! It doesn't accept UTF-8 encoded json strings although json is specified for UTF-8 to my knowledge. It certainly is accepted by and retrieved ok in file_get_contents("php://input"). Is there any reason for that?

gerteb
  • 121
  • 1
  • 12
  • Possible duplicate of [Binary Data in JSON String. Something better than Base64](https://stackoverflow.com/questions/1443158/binary-data-in-json-string-something-better-than-base64) – Chris White Apr 29 '19 at 22:06
  • I have edited the code and comments to clarify my problem. – gerteb Apr 29 '19 at 22:54

1 Answers1

0

"Malformed UTF-8 characters" means the input data contains invalid characters.

If the data is hard coded, save your file with UTF-8 (no BOM) encode.

If not, use iconv to convert or check the input data.

Encrypted data is in binary format, you may need do base64 encode before run json encode

$jsontest->data1 = base64_encode($encrypted);
shingo
  • 18,436
  • 5
  • 23
  • 42
  • The hole point is to avoid base64 encoding, and the resulting overhead. Considering the writing rest io in nodejs with webcrypto, and do the database and data manipulation in php through nodejs. It will solve several problems in php openssl and json en/decoding. But i is in the future. For now i am stuck with base64 workaround. – gerteb May 01 '19 at 06:19
  • The overhead problem if solved using gzdecode in php and pako (https://github.com/nodeca/pako) to gzip data before encrypting it. Depending on entropy the encrypted size is way smaller if the data to encrypt is 1kb or greater. For ~1kb the base64 and encrypted output the size is ~77% if input with gzip compression. Great :-) – gerteb May 06 '19 at 19:29