The character set used for strings in AT commands is controlled by AT+CSCS
. The default value is "GSM"
which is not capable of displaying anything outside a relative limited set of characters.
In your case, to read Arabic or Chinese "UTF-8"
is probably the best choice, although "UCS-2"
also can be used (will require a little post processing though).
Below you can see how the selected character set affects strings. I have kept the phone number to my Chinese teacher from when I lived in Taiwan, stored as "teacher" in Chinese (lǎo shī). The actual phone number is stripped out here, but otherwise the following is a verbatim copy of the responses from my phone:
$ echo at+cscs? | atinout - /dev/ttyACM0 -
+CSCS: "GSM"
OK
$ echo at+cpbr=403 | atinout - /dev/ttyACM0 -
+CPBR: 403,"",145,"??/M"
OK
$ echo at+cscs=? | atinout - /dev/ttyACM0 -
+CSCS: ("GSM","IRA","8859-1","UTF-8","UCS2")
OK
$ echo 'at+cscs="UTF-8"' | atinout - /dev/ttyACM0 -
OK
$ echo at+cscs? | atinout - /dev/ttyACM0 -
+CSCS: "UTF-8"
OK
$ echo at+cpbr=403 | atinout - /dev/ttyACM0 -
+CPBR: 403,"",145,"老師/M"
OK
$ echo 'at+cscs="UCS2"; +cpbr=403' | atinout - /dev/ttyACM0 -
+CPBR: 403,"",145,"80015E2B002F004D"
OK
$ echo 'at+cscs=?' | atinout - /dev/ttyACM0 -
+CSCS: ("00470053004D","004900520041","0038003800350039002D0031","005500540046002D0038","0055004300530032")
OK
$ echo 'at+cscs="005500540046002D0038"' | atinout - /dev/ttyACM0 -
OK
$ echo 'at+cscs=?' | atinout - /dev/ttyACM0 -
+CSCS: ("GSM","IRA","8859-1","UTF-8","UCS2")
OK
Update, upon checking 27.007, the string for the +CUSD: <m>[,<str>,<dcs>]
unsolicited result code is not a regular string, but has its own encoding:
<str>: string type USSD-string (when <str> parameter is not given,
network is not interrogated):
- if <dcs> indicates that 3GPP TS 23.038 [25] 7 bit default alphabet is used:
- if TE character set other than "HEX" (refer command Select TE Character
Set +CSCS): MT/TA converts GSM alphabet into current TE character set
according to rules of 3GPP TS 27.005 [24] Annex A
- if TE character set is "HEX": MT/TA converts each 7-bit character of GSM
alphabet into two IRA character long hexadecimal number (e.g. character
Π (GSM 23) is presented as 17 (IRA 49 and 55))
- if <dcs> indicates that 8-bit data coding scheme is used: MT/TA converts each
8-bit octet into two IRA character long hexadecimal number (e.g. octet with
integer value 42 is presented to TE as two characters 2A (IRA 50 and 65))
<dcs>: 3GPP TS 23.038 [25] Cell Broadcast Data Coding Scheme in integer format
(default 0)
You therefore have to first determine if dcs is 7 or 8 bit, and then decode according to the above.
PS, the "USC2 0x81" format is described here. although it should not behave differently from plain UCS2 in this particular case.