1

I have a function which is returning byte array in both C++ and Java, the logic of the function is same.

Given that the byte array which is returned is same, when I print the array after converting to a basic string like:

std::string str(byteArray,byteArray+len)

I am able to see the output properly, but when I do something like:

new String(byteArray,"UTF-8")

I get some unknown characters on the terminal. How to retrieve the same output as that of C++?

Biffen
  • 6,249
  • 6
  • 28
  • 36
kumarD
  • 574
  • 2
  • 9
  • 23
  • 1
    Some code may actually help. – Aleksandr Apr 03 '18 at 10:27
  • @AleksandrMukhalov i just want to know if there is a statement which is equivalent to std::string str(byteArray,byteArray+len) in java – kumarD Apr 03 '18 at 10:30
  • Are you sure UTF-8 is the right encoding, and the c++ constructor uses that? – kutschkem Apr 03 '18 at 10:30
  • I am not sure, if not UTF-8 what else could it be ? – kumarD Apr 03 '18 at 10:31
  • 1
    @kumarD Well what is your byte array? What is the source string? – kutschkem Apr 03 '18 at 10:32
  • @kutschkem the byte array is a result of AES decryption from openssl library. – kumarD Apr 03 '18 at 10:34
  • @kumarD What other encoding could it be: latin-1, utf-16, utf-32... It could be anything, and the only way to tell for us is by looking at the actual bytes in your array. That it came from ssl doesn't matter, what is important is what went in (is this a html page, what encoding does it have?) – kutschkem Apr 03 '18 at 10:50
  • 1
    Have you confirmed that the byte array is identical after decoding in C++ vs decoding in Java? – Thomas Timbul Apr 03 '18 at 10:51
  • @ThomasTimbul yes they are. – kumarD Apr 03 '18 at 10:56
  • Taking info from here: https://stackoverflow.com/questions/1673445/how-to-convert-unsigned-char-to-stdstring-in-c I read this as "C++ does not use byte arrays, it is a sequence of unsigned chars". Could it be incorrect to treat them as bytes in Java? – Thomas Timbul Apr 03 '18 at 11:31
  • @ThomasTimbul so what is the exact solution for this? – kumarD Apr 03 '18 at 11:42
  • The short response is that I do not know. I think we need more information, like the input array as @kutschkem has asked, and the expected output as a minimum. I am no C++ expert, but I wonder if perhaps other information is also relevant to figure out what your C++ implementation is doing, for example the processor/platform it was compiled for. I am guessing that this may allow clues as to big/little endian or other implicit platform defaults. – Thomas Timbul Apr 03 '18 at 12:07
  • Was the input to the AES encryption text? If so, which character encoding? If not, don't use a Java text datatype (String, char or Character). – Tom Blodget Apr 04 '18 at 23:52

3 Answers3

2

Here's the problem. When you do this:

    new String(byteArray,"UTF-8")

you are saying to the runtime system this:

The byte array contains character data that has been encoded as UTF-8. Convert it into a sequence of Unicode codepoints1 and give them to me as a Java String.

But the bytes in the byte array are clearly NOT a well-formed UTF-8 sequence, because you are getting stuff that looks like garbage.

So what is going on? Well I think that there are two possibilities:

  1. The bytes in the array could actually be characters in a different character encoding. It is clearly not ASCII data because pure 7-bit ASCII is also well-formed as UTF-8. But the bytes could be encoded in some other character encoding. (If we actually had the byte values, we might be able to make an educated guess as to which encoding was used.)

  2. The bytes in the array could actually be garbled. You say that they were obtained by decrypting AES encrypted data. But if you somehow got the decryption incorrect (e.g. you used the wrong key), then you would end up with garbled stuff.

Finally, the closest equivalent in Java to std::string str(byteArray,byteArray+len) is this:

  new String(byteArray, "LATIN-1")

This is because each encoded byte in an LATIN-1 sequence is equal in value to the equivalent Unicode code point.

Whether it is unclear whether that would actually work in your case. Certainly, it won't work if the bytes were garbled due to an incorrect encryption or decryption. Or garbling of the encrypted data in transmission.


1 - actually, UTF-16 code units ... but that's another story.

Stephen C
  • 698,415
  • 94
  • 811
  • 1,216
0

In java I convert byte array like below. This "UTF-8" might create a problem in your case.

new String(byteArray);

Also try with

 new String(byteArray,"UTF-16");

If both the above does not work you can try with below:-

 UnicodeEncoding uEncoding = new UnicodeEncoding();
 string stringContent=uEncoding.GetString(byteArray);

also for detail read http://www.oracle.com/us/technologies/java/supplementary-142654.html

Abhijit Pritam Dutta
  • 5,521
  • 2
  • 11
  • 17
  • This isn't working for me, hence the existence of the question, i am facing encoding issues like. .�[~:D�}��m�,DZ1����U`�]'���5kKx�E����o�W�tw& �HK�"(e�{�"�����W|�A���r�"��;Õ}��9=�sT�7��v��rA}��a�4n#���h��m��PYn��V�R�fS��� ���x!�s�p�IU����Xĩۨ���Y�I̫�ޥ.�� – kumarD Apr 03 '18 at 10:35
  • I think you need to try with all the character set. Better contact the source team in what format they are sending the information and decode accordingly. – Abhijit Pritam Dutta Apr 03 '18 at 10:37
  • it works very well with c++, as you can see in my question. Facing problems in java. – kumarD Apr 03 '18 at 10:37
  • Which library is that function from? – kumarD Apr 03 '18 at 10:50
  • Please read about this http://www.oracle.com/us/technologies/java/supplementary-142654.html. – Abhijit Pritam Dutta Apr 03 '18 at 10:54
0

So, here goes the solution, the problem here was the decryption wasn't properly going through, it wasn't complete but partial, hence there were characters which we could make sense of and the rest were junk, the blunder which i did was using SHA-512 as the message digest algorithm while encryption and MD-5 while decryption.

Cheers!!

kumarD
  • 574
  • 2
  • 9
  • 23