Convert @ sign from byte in GSM 7-bit encoding to Java text

Question

I have given a byte array [97, 98, 0, 99, 100] which is GSM 7-Bit encoded. This should be converted into ab@cd. When I tried to append this given array into a StringBuilder, I was not able to convert the @ sign.

Here is my code:

byte[] byteFinal ={97, 98, 0, 99, 100};
char ch;
StringBuilder str = new StringBuilder();
for(byte b : byteFinal){
    ch  = (char)b;
    System.out.println("ch:"+ch);
    str.append(ch);

}
System.out.println(str.toString());

Pretty sure 0 is char code for `NUL` non-printable string terminator, **not** `@` character — Trash Can, Dec 17 '18 at 06:50
exactly, but how if then i can convert this kind of byte array into a coherence text? — Haim Klainman, Dec 17 '18 at 06:53
@Dummy a bit off topic: I don't `NUL` is string terminator in Java. — Adrian Shum, Dec 17 '18 at 07:03
@Adrian Shum, `NUL` whose ASCII code is 0 is the set-in-stone character used to signify the end of string in computer memory, pretty sure every language created after C handles `NUL` in strings for you behind the scene. Show proof that Java uses a different character code to represent end of string — Trash Can, Dec 17 '18 at 08:00
@Dummy it is simply wrong: NUL simply is a value for character. C-Style string use NUL as a terminator, but that's not what it was designed for. And, a lot of languages do NOT use null-termination scheme to represent strings. For example, in Java, it is internally stored as character array and length of string is simply length of the array, and there is no extra '\0' at the end. (it is simplified view, coz String in Java created from substring could reuse char array storage). Even C++ std::string does not work as you think: https://akrzemi1.wordpress.com/2014/03/20/strings-length/ — Adrian Shum, Dec 17 '18 at 09:08
and, even starting from C++1x, `std::string` requires null-termination internally (https://stackoverflow.com/questions/6077189/will-stdstring-always-be-null-terminated-in-c11) it is more a design decision to simplify `c_str()` and concurrency issues, instead of relying NULL as string terminator (i.e. you can still have `\0` WITHIN the std::string) — Adrian Shum, Dec 17 '18 at 09:14
@HaimKlainman just a suggestion: your question is actually a valid one but you should have mentioned that the original byte array is in GSM 7 Bit encoding. — Adrian Shum, Dec 17 '18 at 11:32
@HaimKlainman I right now also had the problem and found a nice library. See my answer below: https://stackoverflow.com/a/71947804/3351474 — k_o_, Apr 21 '22 at 01:04

Adrian Shum · Accepted Answer · 2018-12-18T09:36:50.970

Based on your comments in other answers, the problem is caused by missing handling of GSM 7-bit encoding.

You can treat GSM 7 Bit as a different character encoding, and you shouldn't use byte array of such encoding as-is and cast each byte to char. Casting byte to char only works iff your bytes are in UTF-8/ASCII or similar encoding, and the characters are less than code point 128.

It seems Java does not provide a built-in Charset for GSM 7-bit (else, you could have done something like String result = new String(byteFinal, GSM_7_BIT_CHARSET);).

You need to handcraft the logic, which looks something like https://mnujali.wordpress.com/2011/12/01/gsm-7-bit-encodingdecoding-used-for-sms-and-ussd-strings-java-code/:

static final char[] GSM7CHARS = {
        0x0040, 0x00A3, 0x0024, 0x00A5, 0x00E8, 0x00E9, 0x00F9, 0x00EC,
        0x00F2, 0x00E7, 0x000A, 0x00D8, 0x00F8, 0x000D, 0x00C5, 0x00E5,
        0x0394, 0x005F, 0x03A6, 0x0393, 0x039B, 0x03A9, 0x03A0, 0x03A8,
        0x03A3, 0x0398, 0x039E, 0x00A0, 0x00C6, 0x00E6, 0x00DF, 0x00C9,
        0x0020, 0x0021, 0x0022, 0x0023, 0x00A4, 0x0025, 0x0026, 0x0027,
        0x0028, 0x0029, 0x002A, 0x002B, 0x002C, 0x002D, 0x002E, 0x002F,
        0x0030, 0x0031, 0x0032, 0x0033, 0x0034, 0x0035, 0x0036, 0x0037,
        0x0038, 0x0039, 0x003A, 0x003B, 0x003C, 0x003D, 0x003E, 0x003F,
        0x00A1, 0x0041, 0x0042, 0x0043, 0x0044, 0x0045, 0x0046, 0x0047,
        0x0048, 0x0049, 0x004A, 0x004B, 0x004C, 0x004D, 0x004E, 0x004F,
        0x0050, 0x0051, 0x0052, 0x0053, 0x0054, 0x0055, 0x0056, 0x0057,
        0x0058, 0x0059, 0x005A, 0x00C4, 0x00D6, 0x00D1, 0x00DC, 0x00A7,
        0x00BF, 0x0061, 0x0062, 0x0063, 0x0064, 0x0065, 0x0066, 0x0067,
        0x0068, 0x0069, 0x006A, 0x006B, 0x006C, 0x006D, 0x006E, 0x006F,
        0x0070, 0x0071, 0x0072, 0x0073, 0x0074, 0x0075, 0x0076, 0x0077,
        0x0078, 0x0079, 0x007A, 0x00E4, 0x00F6, 0x00F1, 0x00FC, 0x00E0};

static final char[] ESCAPE = {
        0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000,
        0x0000, 0x0000, '\n'  , 0x0000, 0x0000, 0x0000, 0x0000, 0x0000,
        0x0000, 0x0000, 0x0000, 0x0000, '^'   , 0x0000, 0x0000, 0x0000,
        0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000,
        0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000,
        '{'   , '}'   , 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000,
        0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, '\\',
        0x0000, 0x0000, 0x0000, 0x0000, '['   , '~'   , ']'   , 0x0000,
        '|'   , 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000,
        0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000,
        0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000,
        0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000,
        0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x20AC, 0x0000, 0x0000,
        0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000,
        0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000,
        0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000};
        // or use -1 instead of 0x0000, depending on your preference

//...

byte[] byteFinal ={97, 98, 0, 99, 100};
StringBuilder sb = new StringBuilder();
boolean escape = false
for(byte b : byteFinal){
    if (b >= 0) {
        if (escape) {
            sb.append(ESCAPE[b] > 0 ? ESCAPE[b] : GSMCHARS[b]);
            escape = false;
        } else {
            if (b == 27) {  // escape
                escape = true;
            } else { 
                sb.append(GSM7CHARS[b]);
            }
        }
    }
}
System.out.println(sb.toString());

Update 1:

With some searching it seems GSM 7 bit encoding is a bit more complicated than what implemented above https://www.developershome.com/sms/gsmAlphabet.asp (Eg escaping etc)

However this at least give you idea on the need for handcrafting some lookup, instead of just casting the byte to char

Update 2:

It seems someone has implemented charset for GSM 7 bit: https://github.com/OpenSmpp/opensmpp/blob/master/charset/src/main/java/org/smpp/charset/Gsm7BitCharset.java

By using it, you can simply do something like String result = new String(byteFinal, GSM_7_BIT_CHARSET); without struggling with all those internals of GSM 7 bit

this is the most efficient answer. i've looked on the link you added and i saw there also many -1s' that you omited? — Haim Klainman, Dec 17 '18 at 11:24
Your need is slightly different. Anyway, given that gsm 7bit is going to give you bytes of value [0, 127], the above logic should work. Those -1s seems only serve for purpose of invalid value. — Adrian Shum, Dec 17 '18 at 11:27
your above example works in the real world app perfect (so far). — Haim Klainman, Dec 18 '18 at 08:01
fortinatly right now i need only to unpack the received sms and convert received byteatrray to to 7bit bytearray. this proccess i've already done so that the last piece of the puzzle was to cast 7bit bytearray to text and static final char[] GSM7CHARS solve this issue. — Haim Klainman, Dec 18 '18 at 08:05
@HaimKlainman no it won't work if you encounter the escape sequences (e.g. `^ { } \ [ ] | ` etc). And other problem is that the original code I copied from is wrong for the value of character > 127. I strongly suggest you make use of the Charset well-tested by others, or, at least add some extra logic dealing with escape sequences. Anyway, if this answer helped you to solve your problem, please consider accepting it. — Adrian Shum, Dec 18 '18 at 08:18
if you insist not to use the Charset, I have updated the code to handle escape sequences of GSM 7-Bit — Adrian Shum, Dec 18 '18 at 08:39
as quoted in answer: https://www.developershome.com/sms/gsmAlphabet.asp There is escaping scheme in GSM 7-bit — Adrian Shum, Dec 18 '18 at 09:43
Ok now i'v got it. but if i count the sequence of the ^ sign which is 27 20 in the table in the link, it not corresponding with the position of the escape character in the `char[] ESCAPE`. i mean it looks like all of the escape characters need to indent one position left ? — Haim Klainman, Dec 18 '18 at 10:05
@HaimKlainman array index in Java is zero-based. And, to be honest, I haven't tested those code and they are written in an "impromptu" manner. You should just treat that as pseudo code and write your own after digesting the idea. And, again, if it is the answer that solve your problem, please accept the answer — Adrian Shum, Dec 18 '18 at 10:23
i've checked your last update on a production environment and it is catching the escape charecters. — Haim Klainman, Dec 18 '18 at 12:49
There is also this lib: https://github.com/brake/telecom-charsets — user432024, Nov 24 '20 at 21:59

score 3 · Answer 2 · answered Dec 17 '18 at 06:53

3

Change array to:

byte[] byteFinal ={97, 98, 64, 99, 100};

Ascii code of '@' is 64. Incidentally caret notation of NUL character (ascii code 0) is ^@ which seems to have confused you here.

answered Dec 17 '18 at 06:53

Gyapti Jain

4,056
20
40

1

Good answer mentioning caret notation. I was initially clueless why OP could confuse `@` with `NUL` :P – Adrian Shum Dec 17 '18 at 07:06
Ok for the full picture of the issue i'm actually receiving data in base64 encoded originally to GSM 7bit packed and i need to unpack it to readble text. now, if i isolate only the @ sign represented by base64 i'll be receiving:AA== and the hex value of it is 00(double zero) the byte representatin of 00 of hex is 0. and thus i'm receiving this annoying zero... – Haim Klainman Dec 17 '18 at 08:04
@HaimKlainman That's strange: a single `@` should be `QA==` in base64. I think there is something wrong for your data. – Adrian Shum Dec 17 '18 at 09:18
@Adrian Shum it is true but to decode base64 properly you need to know how the original message was encoded to. in this case the original message encoded to 7bit hex and then to base64 so the decoded hex value back from base64 is 00 which is [0] in bytes ->which is @ in italics – Haim Klainman Dec 17 '18 at 10:27
Oh! I think I got what you mean by GSM 7 bit. You can treat that as a totally different character encoding scheme. It is just like, you are converting a EBCDIC byte array to string as is, and complain Java giving you the wrong result. The question has nothing to do with base64 actually – Adrian Shum Dec 17 '18 at 10:38

SMA · Answer 3 · 2018-12-17T06:59:36.347

You are using ascii values of characters in your byte array.

Here 64 corresponds to ascii value of '@' character that you are after.

Hence your array should be:

byte[] byteFinal ={97, 98, 64, 99, 100};
                           ^^

Looking at the wiki ascii value of 0 corresponds to null character.

Also to create String, you could just create string as below instead of using StringBuilder:

System.out.println(new String(byteFinal));

So all you need is two lines of code like:

byte[] byteFinal ={97, 98, 64, 99, 100};
System.out.println(new String(byteFinal));

score 0 · Answer 4 · answered Dec 17 '18 at 07:00

Corresponding ASCII value of @ = 64 , Look Wikipedia

Rest of your code is fine!

byte[] byteFinal ={97, 98, 64, 99, 100};
        char ch;
        StringBuilder str = new StringBuilder();
        for(byte b : byteFinal){
            ch  = (char)b;
            System.out.println("ch:"+ch);
            str.append(ch);

        }
        System.out.println(str.toString());

score 0 · Answer 5 · answered Apr 03 '19 at 13:44

0

You can also install the charset in the lib and use getBytes("SCGSM")

answered Apr 03 '19 at 13:44

loser8

362
3
14

score 0 · Answer 6 · answered Apr 21 '22 at 01:02

There is the library jCharset. When the library is on the class path it will be automatically added to the available charsets.

import java.io.UnsupportedEncodingException;

class Scratch {
    public static void main(String[] args) throws UnsupportedEncodingException {
        byte[] encoded = "something".getBytes("GSM7");
        System.out.println(new String(new byte[]{97, 98, 0, 99, 100}, "GSM7"));
    }
}

ab@cd Here are the Maven coordinates.

Convert @ sign from byte in GSM 7-bit encoding to Java text

6 Answers6