43

I want to use this unicode character in my resource file.

But whatever I do, I end with dalvikvm crash (tested with Android 2.3 and 4.2.2):

W/dalvikvm( 8797): JNI WARNING: input is not valid Modified UTF-8: illegal start byte 0xf0
W/dalvikvm( 8797):              string: ''
W/dalvikvm( 8797):              in Landroid/content/res/StringBlock;.nativeGetString:(II)Ljava/lang/String; (NewStringUTF)
E/dalvikvm( 8797): VM aborting
F/libc    ( 8797): Fatal signal 11 (SIGSEGV) at 0xdeadd00d (code=1), thread 8797 (cz.ipex...)

I tried these version in my resource file:

<string name="geolocation_icon" translatable="false">&#x1f4e1;</string> <!-- HTML -->
<string name="geolocation_icon" translatable="false">\uD83D\uDCE1</string> <!-- escaped unicode -->
<string name="geolocation_icon" translatable="false"></string> <!-- unicode character -->

Note that using it in Java String in code works ok:

final String geolocation_icon = "\uD83D\uDCE1";
Pitel
  • 5,334
  • 7
  • 45
  • 72
  • Did you try to use UTF-8 in your XML file? Using hexdump, can you confirm that it is encoded as `0xF0 0x9F 0x93 0xA1` sequence? – mvp May 28 '13 at 08:16
  • Yes, it is, and if you look at the error output above, you see it is complaining about it: `input is not valid Modified UTF-8: illegal start byte 0xf0` – Pitel May 28 '13 at 09:08

2 Answers2

56

Your character (U+1F4E1) is outside of Unicode BMP (Basic Multilingual Plane - range from U+0000 to U+FFFF).

Unfortunately, Android has very weak (if any) support for non-BMP characters. UTF-8 representation for non-BMP characters requires 4 bytes (0xF0 0x9F 0x93 0xA1). But, Android UTF-8 parser only understands 3 bytes maximum (see it here and here).

It works for you when you use UTF-16 surrogate form representation of this character: "\uD83D\uDCE1". If you were able to encode each surrogate UTF-16 character in modified UTF-8 (aka CESU-8) - it would take 6 bytes total (3 bytes in UTF-8 for each member of surrogate pair), then it would be possible. But, Android does not support CESU-8 explicitly either.

So, your current solution - hard-coding this symbol in source code as surrogate UTF-16 pair seems easiest, at least until Android starts fully supporting non-BMP UTF-8.

UPDATE: this seems to be partially fixed in Android 6.0. This commit has been merged into Android 6, and permits presence of 4-byte UTF-8 characters in XML resources. Its not perfect solution - it will simply automatically convert 4-byte UTF-8 into appropriate surrogate pair. However, it allows to move them from your source code into XML resources. Unfortunately, you can't use this solution until your application can stop supporting any Android version except for 6.0 and later.

mvp
  • 111,019
  • 13
  • 122
  • 148
  • This is very frustrating as it makes the Class file much larger than it needs to be. It would be ideal to separate these unicode strings into an `arrays.xml` or `strings.xml` file. – Etienne Lawlor Jun 10 '13 at 03:44
  • 2
    @toobsco42: you might want to [file a bug](http://source.android.com/source/report-bugs.html) in [Android bug database](https://code.google.com/p/android/issues). – mvp Mar 19 '14 at 07:00
  • hi, i have an xml file with surrogate pairs like "\uD83D\uDCE1" but i'm getting errors when trying to parse the file, could you help me ? – SWAppDev Nov 26 '15 at 20:56
  • 1
    @SWAppDev: you can use surrogate pairs in Java source code, but not in resource XML. – mvp Nov 27 '15 at 06:25
  • This [website](http://www.russellcottrell.com/greek/utilities/surrogatepaircalculator.htm) lets you calculate the surrogate pairs. – SoftWyer Dec 10 '16 at 19:59
  • I use it on a menu and it still doesn't work from strings.xml,,nor from the menu xml file. I have to put it in Java source code and reset the menu. On Android 9 – LXJ Mar 31 '21 at 20:16
0

Do it this way

Do not keep problematic emoji in the strings.xml

add it programmatically

<string name="hi_welcome_msg">Hi %1$s</string>

getString(R.string.hi_welcome_msg, user.getFullName() + " \uD83D\uDC4B" );
Imran Baig
  • 357
  • 3
  • 4