4

I'm working with a string of bytes (which can be anywhere between 10kb and 3MB) and I need to filter out approximately 16 bytes (replacing them with other bytes)

At the moment I have a function a bit like this..

BYTE_REPLACE = {
  52: 7, # first number is the byte I want to replace
  53: 12, # while the second number is the byte I want to replace it WITH
}
def filter(st):
  for b in BYTE_REPLACE:
    st = st.replace(chr(b),chr(BYTE_REPLACE[b]))
  return st

(Byte list paraphrased for the sake of this question)

Using map resulted in an execution time of ~.33s, while this results in a 10x faster time of ~.03s (Both performed on a HUGE string, larger than 1.5MB compressed).

While any performance gains would be considerably negligible, is there a better way of doing this?

(I am aware that it would be much more optimal to store the filtered string. This isn't an option, though. I'm fooling with a Minecraft Classic server's level format and have to filter out bytes that certain clients don't support)

falsetru
  • 357,413
  • 63
  • 732
  • 636
MoJi
  • 78
  • 1
  • 6
  • 1
    How are you reading in the string? Is it from the file system, from a URL, is it already all in memory? That will probably have a big influence on the most optimal method. –  Sep 17 '13 at 03:20
  • It's all available in memory (and passed straight to the function in every case) There is a few cases where a single byte will be passed to this function - this is negligible enough that I'm not bothered by it. – MoJi Sep 17 '13 at 03:20
  • How many pairs are in `BYTE_REPLACE`? Just 2? –  Sep 17 '13 at 03:22
  • 16 usually. With the full list, and a loadtest level that is fairly large (512*512*256 bytes uncompressed), it takes .03s to do the full replacement (with str.replace) – MoJi Sep 17 '13 at 03:24
  • 2
    [`string.maketrans`](http://docs.python.org/2/library/string.html#string.maketrans) and [`string.translate`](http://docs.python.org/2/library/string.html#string.translate) may help here. – user2357112 Sep 17 '13 at 03:25
  • related: [Best way to strip punctuation from a string in Python](http://stackoverflow.com/q/265960/4279) – jfs Sep 17 '13 at 04:12

3 Answers3

7

Use str.translate:

Python 3.x

def subs(st):
    return st.translate(BYTE_REPLACE)

Example usage:

>>> subs('4567')
'\x07\x0c67'

Python 2.x

str.translate (Python 2)

import string
k, v = zip(*BYTE_REPLACE.iteritems())
k, v = ''.join(map(chr, k)), ''.join(map(chr, v))
tbl = string.maketrans(k, v)
def subs(st):
    return st.translate(tbl)
falsetru
  • 357,413
  • 63
  • 732
  • 636
  • Using str.translate was approximately twice as fast as the previous way I was doing it. – MoJi Sep 17 '13 at 03:45
4

Look up the translate() method on strings. That allows you to do any number of 1-byte transformations in a single pass over the string. Use the string.maketrans() function to build the translation table. If you usually have 16 pairs, this should run about 16 times faster than doing 1-byte replacements 16 times.

Tim Peters
  • 67,464
  • 13
  • 126
  • 132
0

In your current design, String.replace() is being called on the string n times, for each pair. While its most likely an efficient algorithm, on a 3MB string it might slow down.

If the string is already contained in memory by the time this function is called, I'd wager that the most efficient way would be:

BYTE_REPLACE = {
  52: 7, # first number is the byte I want to replace
  53: 12, # while the second number is the byte I want to replace it WITH
}
def filter(st):
  st = list(st) # Convert string to list to edit in place :/
  for i,s in enumerate(st): #iterate through list
    if ord(s) in BYTE_REPLACE.keys():
        s[i]=chr(BYTE_REPLACE[ord(b)])
  return "".join(st) #return string

There is a large operation to create a new list at the start, and another to convert back to a string, but since python strings are immutable in your design a new string is made for each replacement.

This is all based on conjecture, and could be wrong. You'd want to test it with your actual data.