Get an UTF-16 string length from memory in python

Question

I need to read a utf-16 encoded string that is stored in memory in a python script for LLDB. According to their documentation I'm able to use ReadMemory(address, length, error) but I need to know its length in advance. If not python's decode function fails when it stumbles upon a character it cannot decode (even using the 'ignore' option) and the process stops:

UnicodeEncodeError: 'ascii' codec can't encode character u'\u018e' in position 12: ordinal not in range(128)

Can anyone suggest a way of achieving this? (either using a "python" or "lldb python" implementation). I don't have the original string's length.

Thanks.

Can you show your code? It's great that you show the error, but please show full traceback and the sample code which is raising the error. — David Zemens, Feb 19 '16 at 02:00
There are many ways to represent strings in memory. Does their doc tell you how they do it? — tdelaney, Feb 19 '16 at 02:20
Here is a memory dump example or what I need to parse: `(lldb) memory read 0x10142c838 0x10142c838: 61 00 62 00 63 00 64 00 65 00 00 00 00 00 00 00 a.b.c.d.e....... 0x10142c848: 00 00 00 00 00 00 00 00 8e 01 00 00 00 00 00 00 ................` Seems to be and UTF-16-le encoded string. But I'm not sure if it's always null terminated. I hope this gives a bit more insight. — Anubis, Feb 19 '16 at 15:38

score 2 · Answer 1 · answered Feb 19 '16 at 02:13

2

Is the string 0-terminated? If so, you could read 2 bytes at a time, until you encounter 0x0000, and then you'd know you have a complete string.

If you do this, you'd want to give yourself a constraint (e.g. "I will give up after reading - say - 1MB of data", in case you're running into corrupted memory).

answered Feb 19 '16 at 02:13

Enrico Granata

3,303
18
25

I thought so too, but apparently there is no defined null [termination](http://stackoverflow.com/questions/5923948/utf-16-string-terminator). Is there any function that evaluates this? – Anubis Feb 19 '16 at 11:06
So if I understand, your task is to read a string whose length you don't know and with no known termination? That is a very badly defined problem. What if your valid string is followed by garbage that looks like characters? Are you OK with overly-aggressive printing? Are all characters in the string even going to be printable? – Enrico Granata Feb 19 '16 at 18:43

Get an UTF-16 string length from memory in python

1 Answers1