How to optimize binary file manipulation?

Question

here is my code:

def decode(filename):

    with open(filename, "rb") as binary_file:
        # Read the whole file at once
        data = bytearray( binary_file.read())

    for i in range(len(data)):
        data[i] = 0xff - data[i]

    with open("out.log", "wb") as out:
        out.write(data)

I have a file around 10MB, and I need to translate this file by flipping every bits, and save it to a new file.

It takes around 1 second using my code to translate a 10MB file, while it only takes less than 1ms using C.

This is my first python script. I don't if it is right to use bytearray. The most time consuming code is loop for bytearray.

You don't have to loop expclicitly over the array, this is very slow in pure Python. eg. data=0xff - data is engough (without any explicit loop). Another alternative is to use an compiler like Numba (LLVM based quite like Clang and Flang). Example on bitshifting using Numba: https://stackoverflow.com/a/45070947/4045774 — max9111, Dec 04 '18 at 10:32

martineau · Accepted Answer · 2018-12-04T21:48:48.170

If using using the numpy library is an option, then using it would be much^★ faster since it can perform the operation on all the bytes via a single statement. Doing byte-level operations in pure Python to relatively large amoont of data is inherently going to be relatively slow as compared to using a module like numpy which is implemented in C and optimized for array processing.

^{★ Although not by quite as much in Python 2 as in 3 (see results below).}

The following is a framework I set up to benchmark using it vs the code in your question. It may seem like a lot of code, but most of it is just part of the scaffolding for making performance comparisons.

I encourage others answering this question to also make use of it.

from __future__ import print_function
from collections import namedtuple
import os
import sys
from random import randrange
from textwrap import dedent
from tempfile import NamedTemporaryFile
import timeit
import traceback


N = 1  # Number of executions of each "algorithm".
R = 3  # Number of repetitions of those N executions.

UNITS = 1024 * 1024  # MBs
FILE_SIZE = 10 * UNITS

# Create test files. Must be done here at module-level to allow file
# deletions at end.
with NamedTemporaryFile(mode='wb', delete=False) as inp_file:
    FILE_NAME_IN = inp_file.name
    print('Creating temp input file: "{}", length {:,d}'.format(FILE_NAME_IN, FILE_SIZE))
    inp_file.write(bytearray(randrange(256) for _ in range(FILE_SIZE)))

with NamedTemporaryFile(mode='wb', delete=False) as out_file:
    FILE_NAME_OUT = out_file.name
    print('Creating temp output file: "{}"'.format(FILE_NAME_OUT))


# Common setup for all testcases (executed prior to any Testcase specific setup).
COMMON_SETUP = dedent("""
    from __main__ import FILE_NAME_IN, FILE_NAME_OUT
""")

class Testcase(namedtuple('CodeFragments', ['setup', 'test'])):
    """ A test case is composed of separate setup and test code fragments. """
    def __new__(cls, setup, test):
        """ Dedent code fragment in each string argument. """
        return tuple.__new__(cls, (dedent(setup), dedent(test)))

testcases = {
    "user3181169": Testcase("""
        def decode(filename, out_filename):
            with open(filename, "rb") as binary_file:
                # Read the whole file at once
                data = bytearray(binary_file.read())

            for i in range(len(data)):
                data[i] = 0xff - data[i]

            with open(out_filename, "wb") as out:
                out.write(data)

        """, """
        decode(FILE_NAME_IN, FILE_NAME_OUT)
        """
    ),

    "using numpy": Testcase("""
        import numpy as np

        def decode(filename, out_filename):
            with open(filename, 'rb') as file:
                data = np.frombuffer(file.read(), dtype=np.uint8)

            # Applies mathematical operation to entire array.
            data = 0xff - data

            with open(out_filename, "wb") as out:
                out.write(data)
        """, """
        decode(FILE_NAME_IN, FILE_NAME_OUT)
        """,
    ),
}

# Collect timing results of executing each testcase multiple times.
try:
    results = [
        (label,
         min(timeit.repeat(testcases[label].test,
                           setup=COMMON_SETUP + testcases[label].setup,
                           repeat=R, number=N)),
        ) for label in testcases
    ]
except Exception:
    traceback.print_exc(file=sys.stdout)  # direct output to stdout
    sys.exit(1)

# Display results.
major, minor, micro = sys.version_info[:3]
bitness = 64 if sys.maxsize > 2**32 else 32
print('Fastest to slowest execution speeds using ({}-bit) Python {}.{}.{}\n'
      '({:,d} execution(s), best of {:d} repetition(s)'.format(
            bitness, major, minor, micro, N, R))
print()

longest = max(len(result[0]) for result in results)  # length of longest label
ranked = sorted(results, key=lambda t: t[1]) # ascending sort by execution time
fastest = ranked[0][1]
for result in ranked:
    print('{:>{width}} : {:9.6f} secs, relative speed: {:6,.2f}x, ({:8,.2f}% slower)'
          ''.format(
                result[0], result[1], round(result[1]/fastest, 2),
                round((result[1]/fastest - 1) * 100, 2),
                width=longest))

# Clean-up.
for filename in (FILE_NAME_IN, FILE_NAME_OUT):
    try:
        os.remove(filename)
    except FileNotFoundError:
        pass

Output (Python 3):

Creating temp input file: "T:\temp\tmpw94xdd5i", length 10,485,760
Creating temp output file: "T:\temp\tmpraw4j4qd"
Fastest to slowest execution speeds using (32-bit) Python 3.7.1
(1 execution(s), best of 3 repetition(s)

using numpy :  0.017744 secs, relative speed:   1.00x, (    0.00% slower)
user3181169 :  1.099956 secs, relative speed:  61.99x, (6,099.14% slower)

Output (Python 2):

Creating temp input file: "t:\temp\tmprk0njd", length 10,485,760
Creating temp output file: "t:\temp\tmpvcaj6n"
Fastest to slowest execution speeds using (32-bit) Python 2.7.15
(1 execution(s), best of 3 repetition(s)

using numpy :  0.017930 secs, relative speed:   1.00x, (    0.00% slower)
user3181169 :  0.937218 secs, relative speed:  52.27x, (5,126.97% slower)

Good work. decode take 10s, decode2 take 10ms. writing byte per byte is not the good way. mmap doesn't accelerate vs direct read(). — B. M., Dec 04 '18 at 13:00
@B.M.: Thanks. You're absolutely right about `mmap` not helping, so I removed that to one only using `numpy` (and added fairly generic code to do benchmark different approaches). — martineau, Dec 04 '18 at 19:39

How to optimize binary file manipulation?

1 Answers1