1

I wrote a basic python program to parse android's resources.arsc. it prints out all strings found in the file. The strings have a zero value byte in between each character. This suggests to me that the strings are stored in utf-16. I don't know if that is correct, but android strings are localizable so I think it is. I am using string.decode('hex') to print the string out in human readable format. Here's a sample with a list of bytes that make up the string:

>>> print ''.join(['00', '72', '00', '65', '00', '73', '00', '2f', '00', '64', '00', '72',    '00', '61', '00', '77', '00', '61', '00', '62', '00', '6c', '00', '65', '00', '2f', '00', '61', '00', '62', '00', '6f', '00', '75', '00', '74', '00', '2e', '00', '70', '00', '6e', '00', '67', '00', '00', '00']).decode('hex')
res/drawable/about.png

The issue is, when I pipe this program to grep, I cannot grep for any of the strings read. How can I print it out to the shell so that grep will be able to match in its output? Thanks!

(EDIT) I did indeed print the string, but in my example I thought it would be better to show both the 'print'ed version and the returned version. sorry for the confusion. In this example, it is the '/res/drawable/about.png' that cannot be grepped.

(EDIT2) a simple demonstration:

11:33 AM ~/learning_python $ python -c "print ''.join(['00', '72', '00', '65', '00', '73', '00', '2f', '00', '64', '00', '72',    '00', '61', '00', '77', '00', '61', '00', '62', '00', '6c', '00', '65', '00', '2f', '00', '61', '00', '62', '00', '6f', '00', '75', '00', '74', '00', '2e', '00', '70', '00', '6e', '00', '67', '00', '00', '00']).decode('hex')"
res/drawable/about.png
11:33 AM ~/learning_python $ python -c "print ''.join(['00', '72', '00', '65', '00', '73', '00', '2f', '00', '64', '00', '72',    '00', '61', '00', '77', '00', '61', '00', '62', '00', '6c', '00', '65', '00', '2f', '00', '61', '00', '62', '00', '6f', '00', '75', '00', '74', '00', '2e', '00', '70', '00', '6e', '00', '67', '00', '00', '00']).decode('hex')" | grep about
11:33 AM ~/learning_python $ 

(EDIT3) another demonstration, I think this proves the data is in utf-16-be:

11:33 AM ~/learning_python $ python -c "print ''.join(['00', '72', '00', '65', '00', '73', '00', '2f', '00', '64', '00', '72',    '00', '61', '00', '77', '00', '61', '00', '62', '00', '6c', '00', '65', '00', '2f', '00', '61', '00', '62', '00', '6f', '00', '75', '00', '74', '00', '2e', '00', '70', '00', '6e', '00', '67', '00', '00', '00']).decode('hex')" > testfile
11:35 AM ~/learning_python $ iconv -f utf16be -t utf8 testfile
res/drawable/about.png
11:35 AM ~/learning_python $ iconv -f utf16be -t utf8 testfile | grep about
Binary file (standard input) matches
11:35 AM ~/learning_python $ iconv -f utf16be -t utf8 testfile | grep -a about
res/drawable/about.png
raidzero
  • 299
  • 1
  • 4
  • 13
  • Did you "print" the decoded string? – Patrick Oct 18 '12 at 17:12
  • Yes, that's how the final string was produced. I edited my question for clarity. – raidzero Oct 18 '12 at 17:14
  • put all in an array and you will notice the prob python -c "print [''.join(['00', '72', '00', '65', '00', '73', '00', '2f', '00', '64', '00', '72', '00', '61', '00', '77', '00', '61', '00', '62', '00', '6c', '00', '65', '00', '2f', '00', '61', '00', '62', '00', '6f', '00', '75', '00', '74', '00', '2e', '00', '70', '00', '6e', '00', '67', '00', '00', '00']).decode('hex')]" – jassinm Oct 18 '12 at 18:03
  • Possible duplicate of [grepping binary files and UTF16](https://stackoverflow.com/questions/3752913/grepping-binary-files-and-utf16) – kenorb Jan 17 '19 at 13:11

2 Answers2

2

Decode the characters:

'\x00r\x00e\x00s'.decode('utf-16-be') # produces u'res'

Then you can print out the decoded string:

$ python -c "print ''.join(['00', '72', '00', '65', '00', '73', '00', '2f', '00', '64', '00', '72',    '00', '61', '00', '77', '00', '61', '00', '62', '00', '6c', '00', '65', '00', '2f', '00', '61', '00', '62', '00', '6f', '00', '75', '00', '74', '00', '2e', '00', '70', '00', '6e', '00', '67', '00', '00', '00', '00']).decode('hex').decode('utf-16-be').rstrip('\0')" | grep about
res/drawable/about.png
nneonneo
  • 171,345
  • 36
  • 312
  • 383
  • Thanks for that... I can now grep the output of my program, but I have to use the -a switch. I can live with that :) – raidzero Oct 18 '12 at 18:09
1

Use ripgrep utility instead of grep which can support UTF-16 files.

ripgrep supports searching files in text encodings other than UTF-8, such as UTF-16, latin-1, GBK, EUC-JP, Shift_JIS and more. (Some support for automatically detecting UTF-16 is provided. Other text encodings must be specifically specified with the -E/--encoding flag.).

Example syntax:

rg sometext file
kenorb
  • 155,785
  • 88
  • 678
  • 743