-1

I'm working on receipt application on c#. I have receipt example with english and arabic words as in below.

enter image description here

I have json text and i try convert that json text to hex and sent as hex format via socket message. My problem is i try to convert text to Hex i cannot convert successfully. Here is how i convert text to Hex

string hexOutput = "";
byte[] ba = Encoding.Default.GetBytes(output);
hexOutput = BitConverter.ToString(ba);
hexOutput = hexOutput.Replace("-", "");

Here my json example

"PaymentReceipt":[
    {
       "DocumentQualifier":"CustomerReceipt",
       "OutputContent":{
          "OutputFormat":"Text",
          "OutputText":[
             {
                "Text":"------------------------------------------------------------"
             },
             {
                "Text":"XXXXXXXXXX2345"
             },
             {
                "Text":"Purchase Amount                    مبلغ الشراء"
             },
             {
                "Text":"5.00"
             },
             {
                "Text":""
             },
             {
                "Text":"Thanks for your visit             شكرا لزيارتكم"
             }
          ]

What can i do for correctly convert text to HEX?

saulyasar
  • 797
  • 1
  • 17
  • 45
  • 3
    `Encoding.Default` seems like a bad idea. It carries a [warning](https://learn.microsoft.com/en-us/dotnet/api/system.text.encoding.default?view=net-5.0#remarks) on the docuemntation page: _"Different computers can use different encodings as the default, and the default encoding can change on a single computer. If you use the Default encoding to encode and decode data streamed between computers or retrieved at different times on the same computer, it may translate that data incorrectly."_ – ProgrammingLlama May 18 '21 at 06:58
  • 4
    You ask how to correctly convert it into hex, but that isn't a simple question. Hex is a representation of binary data, but first you need to convert your text to binary data (`byte[]`). You can do this using a text encoder's `GetBytes` method, as you've attempted in your code. The problem is: you haven't mentioned what encoding should be used. At a guess, I'd think UTF8, but different human languages might have specific encodings that were produced for them, and are expected by the service you're calling. What encoding does the documentation ask for? – ProgrammingLlama May 18 '21 at 07:02
  • Are you planning to update the question with the encoding requirements? – ProgrammingLlama May 18 '21 at 08:20
  • Hi @Llama sorry i cannot check earlier but i tried with Encoding.UTF8.GetBytes() and again i cannot convert correctly – saulyasar May 18 '21 at 08:29
  • So what is the correct encoding? You have to tell us. We can't tell you because it depends entirely on the requirements of the service you're passing data to. – ProgrammingLlama May 18 '21 at 08:32
  • Actually arabic text is hardcoded in json by me because i try to simulate one case. The problem is when i try to convert normal text all perfect but in arabic its complicated and i can easly convert hex to string or string to hex in android – saulyasar May 18 '21 at 08:37
  • I think you misunderstand me. To turn text (`string`) into a `byte[]` you have to encode it. There are a multitude of standards for doing this. Some popular ones are ASCII and UTF8. ASCII only really works with "English" characters though. Because of this, encoding the same text with different encoders yields different results. [Example](https://rextester.com/TJBCFX64346). So you're effectively asking us to guess which encoding is correct for situation. If you have Java code for doing this "correctly", then you should provide it so that we can use that as a basis for helping you. – ProgrammingLlama May 18 '21 at 08:45

2 Answers2

1

UTF-8 is the most widely used encoding on the web. Check Usage of character encodings broken down by ranking

Code

HexManager static class contains helper extension methods to convert from/to hex.

using System;
using System.Text;

public static class HexManager
{
    public static String ToHex(this byte[] data)
    {
        StringBuilder ret = new StringBuilder(String.Empty);
        foreach (byte Value in data)
        {
            ret.Append(Value.ToString("x2"));
        }
        return ret.ToString();
    }

    public static byte[] FromHex(this string data)
    {
        int Pair = data.Length % 2;
        byte[] ret = new byte[data.Length / 2];
        if (Pair == 0)
        {
            for (int i = 0; i < data.Length / 2; i++)
            {

                ret[i] = Convert.ToByte(data.Substring(i * 2, 2), 16);
            }
        }
        else
        {
            throw new SystemException("Invalid hex string.");
        }
        return ret;
    }

    public static byte[] ToByteArray(this string data)
    {
        return Encoding.UTF8.GetBytes(data);
    }

    public static string ToStr(this byte[] data)
    {
        return Encoding.UTF8.GetString(data);
    }
}

public class Example
{
    public static void Main()
    {
        Console.OutputEncoding = System.Text.Encoding.UTF8; //Use Windows Terminal for correct output of UTF-8. No problem on Linux or MacOS Default is UTF-8.
        string data = "Purchase Amount                    مبلغ الشراء";
        Console.WriteLine("Original: " + data);
        data = data.ToByteArray().ToHex();
        Console.WriteLine("Encoded: " + data);
        data = data.FromHex().ToStr();
        Console.WriteLine("Decoded: " + data);

        Console.WriteLine();
        Console.WriteLine();

        data = "Thanks for your visit             شكرا لزيارتكم";
        Console.WriteLine("Original: " + data);
        data = data.ToByteArray().ToHex();
        Console.WriteLine("Encoded: " + data);
        data = data.FromHex().ToStr();
        Console.WriteLine("Decoded: " + data);

        Console.ReadKey();
    }
}

Output

  • On Windows. In console for correct output(arabic codepoints) using UTF-8, I recomend install Windows Terminal.
  1. run wt.exe (Windows terminal)
  2. Inside Windows terminal, run the program.
  3. Magic.
  • On Linux or MacOS no problem with UTF-8.
PS C:\Users\Megam\source\repos\ConsoleApplication1\ConsoleAppCs\bin\Debug\net5.0> .\ConsoleAppCs.exe
Original: Purchase Amount                    مبلغ الشراء
Encoded: 507572636861736520416d6f756e742020202020202020202020202020202020202020d985d8a8d984d8ba20d8a7d984d8b4d8b1d8a7d8a1
Decoded: Purchase Amount                    مبلغ الشراء


Original: Thanks for your visit             شكرا لزيارتكم
Encoded: 5468616e6b7320666f7220796f757220766973697420202020202020202020202020d8b4d983d8b1d8a720d984d8b2d98ad8a7d8b1d8aad983d985
Decoded: Thanks for your visit             شكرا لزيارتكم

Windows Terminal output

Joma
  • 3,520
  • 1
  • 29
  • 32
  • Perferct example . Thanks @Joma. My problem was i try to encoding with UTF8 is ok but when i decode from response side i wasn't decoding with UTF8 because of values looks ?????? – saulyasar May 18 '21 at 09:41
  • Instead of encode to hex a any json property, you can: **1.** generate all Json content **2.** convert all json content to Hex **3.** In the server decode the hex string to original string(json content) – Joma May 18 '21 at 10:01
  • @saulyasar do you mean to say that this was an issue rendering the string in a console window after converting it back from hex? – ProgrammingLlama May 18 '21 at 12:46
  • On Windows the default encoding is UTF-16. Classic cmd.exe or powershell can't show the output correctly. Windows Terminal resolve this issue, it can manage UTF-8 or UTF-16 to print correctly the chars. – Joma May 19 '21 at 06:00
  • The original idea is encode all generated JSON content in hex instead of encode specific properties of your json content to hex, it's more complicated to manage. – Joma May 19 '21 at 06:02
  • @Llama yes because when retun string hex i was getting that hex value and try on internet online converting it was showing weird characters – saulyasar May 19 '21 at 11:07
  • @saulyasar Where are you showing the output in server side? – Joma May 20 '21 at 20:02
0

Arabic characters is lies on UTF 8 characters, and for this you initially have to convert it as byte array like this

byte[] bytes = Encoding.UTF8.GetBytes(output);

After that convert it into base64 string or any string you need to convert.

For converting to base64 string

string base64Str = Convert.ToBase64String(bytes);

For getting original bytes from base64 string

byte originalByte = Convert.FromBase64String(base64Str);

For getting UTF8 original output from byte array

string originalResult = Encoding.UTF8.GetString(bytes);

In one line you can get base64 string and revert back as follows

Encode

string encodedStr = Convert.ToBase64String(Encoding.UTF8.GetBytes(output));

Decode

string OriginalString = Encoding.UTF8.GetString(Convert.FromBase64String(encodedStr));
Faraz Ahmed
  • 1,467
  • 2
  • 18
  • 33