0

I am trying to get an output from Bash and work with it in Python as bytes. The output of the command is something like this, after is formatted - \x48\x83\xec\x08\x48\x8b\x05\xdd

Then Python receives it through sys.argv but for it to be recognised as bytes I need to use .encode(), when encoding it I get b'\\x48\\x83\\xec\\x08\\x48\\x8b\\x05\\xdd' which is the representation of a single back slash according to what I've read, but I would need it with a single backslash and not two.

I've tried different solutions such as encoding it and decoding it again with 'unicode_escape' as suggested here - Remove double back slashes to no avail.

Surely I am missing some knowledge here, any help would be really appreciated.

#!/bin/sh
echo Enter a executable name.
read varname
echo Enter a PID to search in memory.
read PID
byteString=$(objdump -d -j .text /bin/$varname | head -n100 | tail -n93 | 
cut -c11-30 | sed 's/[a-z0-9]\{2\}/\\x&/g' | tr -d '[:space:]')
python3 /home/internship/Desktop/memory_analysis.py $PID $byteString

Above is the bash script with the command to get the bytes. And the following is how the bytes are received by Python.

#!/usr/bin/python3
import sys

if len(sys.argv) < 2:
  print ("Please specify a PID")
  exit(1);
element = bytes(sys.argv[2].encode())
print(element)

output - b'\\xe8\\x5b\\xfd\\xff\\xff\\xe8\\x56\\xfd\\xff\\xff\\xe8\\x51\\xfd\\xff\\xff\\xe8\\x4c\\xfd\\xff\\xff\\xe8\\x47\\xfd\\xff\\xff'

When I hard code it with a variable it works just fine such as this - element = b'\xe8\x5b\xfd\xff\xff\xe8\x56\xfd\xff\xff\xe8\x51\xfd\xff\xff\xe8\x4c\xfd\xff\xff' Although, I need a some automation.

Thank you in advance!

Pedro
  • 11
  • 4
  • Did you try the `from codecs import encode` in that post? When you say "I've tried different solutions" you should post the code you tried. – William Jun 25 '21 at 22:52
  • I have tried with from codecs import encode, but I may have done it wrong, I have now added the code, thank you! – Pedro Jun 26 '21 at 14:14
  • Does this answer your question? [How do I remove double back slash (\`\\‌\`) from a bytes object?](https://stackoverflow.com/questions/38763771/how-do-i-remove-double-back-slash-from-a-bytes-object) – Darrius Jul 21 '21 at 05:45

3 Answers3

0

If you can pass this data as binary to a Python script then you can deal with it like this:

import os
import sys

if __name__ == '__main__':
    bytes_arg = os.fsencode(sys.argv[1])
    print(bytes_arg)
~$ python script.py $'\x48\x83\xec\x08\x48\x8b\x05\xdd'
b'H\x83\xec\x08H\x8b\x05\xdd'

But if you get a string it ends up being x48x83xecx08x48x8bx05xdd.

import os
import sys

if __name__ == '__main__':
    cleaned = ''.join(sys.argv[1].split('x'))
    bytes_arg = bytes.fromhex(cleaned)
    print(bytes_arg)
~$ python script.py \x48\x83\xec\x08\x48\x8b\x05\xdd
b'H\x83\xec\x08H\x8b\x05\xdd'
K.Novichikhin
  • 339
  • 2
  • 8
  • Hi! Thank you so much for the answer. The problem is that I would like to keep the bytes unchanged from the output of the command but accepted by Python as bytes as for example b'\x48\x83\xec\x08\x48\x8b\x05\xdd' I have updated the initial comment to show the code and how I get the bytes. Thank you again! – Pedro Jun 26 '21 at 14:13
0

Hope this is what you expected :

python -c "import sys;print(bytes.fromhex(sys.argv[1].replace(r'\x','')))" '\x48\x83\xec\x08\x48\x8b\x05\xdd'
# Output : b'H\x83\xec\x08H\x8b\x05\xdd'

Based on your update :

test.sh

#!/bin/sh
byteString='\xe8\x5b\xfd\xff\xff\xe8\x56\xfd\xff\xff\xe8\x51\xfd\xff\xff\xe8\x4c\xfd\xff\xff'
PID=999
python3 test.py $PID "$byteString"

test.py

#!/usr/bin/python3

import re
import sys

if len(sys.argv) < 2:
  print ("Please specify a PID")
  exit(1);
element = bytes.fromhex(sys.argv[2].replace(r'\x',''))
print(element)
# output b'\xe8[\xfd\xff\xff\xe8V\xfd\xff\xff\xe8Q\xfd\xff\xff\xe8L\xfd\xff\xff'
print("b'"+re.sub('(..)', r'\\x\1',element.hex())+"'")
# output b'\xe8\x5b\xfd\xff\xff\xe8\x56\xfd\xff\xff\xe8\x51\xfd\xff\xff\xe8\x4c\xfd\xff\xff'
Philippe
  • 20,025
  • 2
  • 23
  • 32
  • Hi! Thank you so much for the answer. The problem is that I would like to keep the bytes unchanged from the output of the command but accepted by Python as bytes as for example b'\x48\x83\xec\x08\x48\x8b\x05\xdd' I have updated the initial comment to show the code and how I get the bytes. – Pedro Jun 26 '21 at 14:04
  • For bytes, b'H' is exactly the same as b'\x48' – Philippe Jun 26 '21 at 14:47
  • Hi, yes you are right, although I need it in the form of \x48 as I am looking for a match with that format, so I am trying to keep that format in a byte string. – Pedro Jun 26 '21 at 15:24
  • Hi again! Thank you for the update. Sorry if I didn't explain myself correctly, the code works very well but I would need to end with a byte string to feed the script with. The same solution but as b'\xe8\x5b\xfd\xff\xff\xe8\x56...' instead of just a string, would there be a way of doing this? Really appreciate your help! – Pedro Jun 26 '21 at 16:15
  • You just saved me, sir, thank you so very much! – Pedro Jun 26 '21 at 16:37
0

You actually don't have to use the codecs module, I just used it in that original answer in an attempt to make things more visually accommodating. Your question is practically identical to the one you referenced. The codecs.encode() function and the str.encode() method can both use the raw_unicode_escape text encoding.

In fact, you can just do as follows:

sys.argv[2].encode('raw_unicode_escape')

Just remember that raw_unicode_escape neither escapes or un-escapes backslashes when encoding or decoding.

All the current answers have given you what you wanted, but keep in mind that bytes objects are rendered different when printed. Additionally, when you encode a string you don't need to use the bytes() function, since it is automatically converted to a bytes object when encoded.

>>> b'\x48\x83\xec\x08\x48\x8b\x05\xdd' == b'H\x83\xec\x08\x48\x8b\x05\xdd'
True
Darrius
  • 313
  • 2
  • 14