In cybersecurity we often use the strings
binary to extract any plaintext string data in a memory dump. I'm trying to do the equivalent in Python.
from struct import unpack
find_str = "King"
strings = []
for stream in data.streams:
if type(stream.data) == bytes:
# Not a particular readable solution, but orders of magnitude faster than
# alternatives: https://stackoverflow.com/a/57543519/9400421
unpacked = list(unpack(f'{len(stream.data)}c', stream.data))
string = ''
null = b'\x00'
for byte in unpacked:
try:
# ultimately need to track multiple strings arrays for each
# popular encoding scheme to catch diverse string encodings.
decoded = byte.decode('ASCII')
print(byte, '=>', decoded)
if byte == null:
print(byte, '=>', 'null')
if string != '':
strings.append(string)
string = ''
else:
string += decoded
except:
print("couldn't decode:", byte)
if string != '':
strings.append(string)
string = ''
print(strings)
output: ... , '*', '\x7f', '\x10', '\x10', '\x04', '\x01', '\x12+', '\x7f', '*', '\x7f', '@', '\x10', '\x02', '\x01', '\x10\x13+', '\x7f', '\x0c', '\x01',
...
My problem is this outputs a lot of decoded values which are obviously not normal characters - they're decoded to hex strings.
My first question is: Why are these hex strings not decoding to normal characters, but not triggering my catch
statement? I thought anything not "cleanly" decoded as characters by the decoding method used would be filtered out by my code.
My second question is: How can I discard the "trash" characters / filter them from "cleanly" decoded characters?