17

I'm trying to write a script which will extract strings from an executable binary and save them in a file. Having this file be newline-separated isn't an option since the strings could have newlines themselves. This also means, however, that using the unix "strings" utility isn't an option, since it just prints out all the strings newline-separated, meaning there's no way to tell which strings have newlines included just by looking at the output of "strings". Thus, I was hoping to find a python function or library which implements the same functionality of "strings", but which will give me those strings as variables so that I can avoid the newline issue.

Thanks!

joshlf
  • 21,822
  • 11
  • 69
  • 96
  • Possible duplicate of http://stackoverflow.com/questions/11599226/how-to-convert-binary-string-to-ascii-string-in-python – Fredrik Pihl Jun 19 '13 at 16:01
  • 3
    @FredrikPihl, that's about converting between binary and textual representations. This is about extracting strings from a binary executable. Confusingly reused terminology, but different questions. Thanks for the catch, though; it would've been good to know if this was a dup. – joshlf Jun 19 '13 at 16:07
  • you are right, this is the third Q I misinterpreted today; need some sleep :-) – Fredrik Pihl Jun 19 '13 at 16:15

4 Answers4

25

Here's a generator that yields all the strings of printable characters >= min (4 by default) in length that it finds in filename:

import string

def strings(filename, min=4):
    with open(filename, errors="ignore") as f:  # Python 3.x
    # with open(filename, "rb") as f:           # Python 2.x
        result = ""
        for c in f.read():
            if c in string.printable:
                result += c
                continue
            if len(result) >= min:
                yield result
            result = ""
        if len(result) >= min:  # catch result at EOF
            yield result

Which you can iterate over:

for s in strings("something.bin"):
    # do something with s

... or store in a list:

sl = list(strings("something.bin"))

I've tested this very briefly, and it seems to give the same output as the Unix strings command for the arbitrary binary file I chose. However, it's pretty naïve (for a start, it reads the whole file into memory at once, which might be expensive for large files), and is very unlikely to approach the performance of the Unix strings command.

Zero Piraeus
  • 56,143
  • 27
  • 150
  • 160
  • That's pretty good. Extra challenge: can you figure out how to only read through the initialized and loaded sections of an object file? ;) – joshlf Jun 19 '13 at 17:03
  • @joshlf13 Which format of object file? ELF? XCOFF? Windows Portable Executable? Depending on the format, you could find dedicated Python library to extract the various sections. Please post an other question if you're interested in that topic. – Sylvain Leroux Jun 19 '13 at 17:10
  • @joshlf13 Not without increasing my knowledge of the structure of object files from "none at all" to "more than none at all", no ;-) Unless they're a lot simpler than I imagine, that would in any case make the code significantly more complex (to the extent that you'd be better off hunting down a library to parse them). – Zero Piraeus Jun 19 '13 at 17:14
  • @SylvainLeroux, I was mostly joking. But in practicality, yes, those would be valid concerns. – joshlf Jun 19 '13 at 17:16
  • 1
    @ZeroPiraeus, just so you know, I ended up downloading the strings source and modifying it to separate by a null character instead of a newline. It was surprisingly easy, all things considered. – joshlf Jun 19 '13 at 17:17
  • Why does this function prints only integer values in python 3? – sundar_ima Oct 03 '16 at 15:46
  • @sundar_ima because it was written very quickly for Python 2 only, and Python 3 handles `bytes` (specified by the `"rb"` argument to `open()`) [differently](http://stackoverflow.com/q/14267452/1014938). I've updated the function to work correctly with Python 3 now - thanks for the catch! – Zero Piraeus Oct 03 '16 at 19:01
6

To quote man strings:

STRINGS(1)                   GNU Development Tools                  STRINGS(1)

NAME
       strings - print the strings of printable characters in files.

[...]
DESCRIPTION
       For each file given, GNU strings prints the printable character
       sequences that are at least 4 characters long (or the number given with
       the options below) and are followed by an unprintable character.  By
       default, it only prints the strings from the initialized and loaded
       sections of object files; for other types of files, it prints the
       strings from the whole file.

You could achieve a similar result by using a regex matching at least 4 printable characters. Something like that:

>>> import re

>>> content = "hello,\x02World\x88!"
>>> re.findall("[^\x00-\x1F\x7F-\xFF]{4,}", content)
['hello,', 'World']

Please note this solution require the entire file content to be loaded in memory.

Sylvain Leroux
  • 50,096
  • 7
  • 103
  • 125
0

The strings command allows you to change the output separator with --output-separator, so instead of a new line char you can use a custom string instead (one you wouldn't expect to find in your binary files), and including the newlines can be done with --include-all-whitepaces:

$ strings --include-all-whitespace --output-separator="YOURSEPARATOR" test.bin

buddemat
  • 4,552
  • 14
  • 29
  • 49
-1

You can also call strings directly for example like this:

def strings(bytestring: bytes, min: int = 10) -> str:
    cmd = "strings -n {}".format(min)
    process = subprocess.Popen(
        cmd, shell=True, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, stdin=subprocess.PIPE)
    process.stdin.write(bytestring)
    output = process.communicate()[0]
    return output.decode("ascii")
TheCharlatan
  • 288
  • 3
  • 17