5

I'm working with a bytearray from file data. I'm opening it as 'r+b', so can change as binary.

In the Python 3.7 docs, it explains that a RegEx's finditer() can use m.start() and m.end() to identify the start and end of a match.

In the question Insert bytearray into bytearray Python, the answer says an insert can be made to a bytearray by using slicing. But when this is attempted, the following error is given: BufferError: Existing exports of data: object cannot be re-sized.

Here is an example:

    pat = re.compile(rb'0.?\d* [nN]')   # regex, binary "0[.*] n"
    with open(file, mode='r+b') as f:   # updateable, binary
        d = bytearray(f.read())         # read file data as d [as bytes]
        it = pat.finditer(d)            # find pattern in data as iterable
        for match in it:                # for each match,
            m = match.group()           # bytes of the match string to binary m
            ...
            val = b'0123456789 n'
            ...
            d[match.start():match.end()] = bytearray(val)

In the file, the match is 0 n and I'm attempting to replace it with 0123456789 n so would be inserting 9 bytes. The file can be changed successfully with this code, just not increased in size. What am I doing wrong? Here is output showing all non-increasing-filesize operations working, but it failing on inserting digits:

*** Changing b'0.0032 n' to b'0.0640 n'
len(d): 10435, match.start(): 607, match.end(): 615, len(bytearray(val)): 8
*** Found: "0.0126 n"; set to [0.252] or custom:
*** Changing b'0.0126 n' to b'0.2520 n'
len(d): 10435, match.start(): 758, match.end(): 766, len(bytearray(val)): 8
*** Found: "0 n"; set to [0.1] or custom:
*** Changing b'0 n' to b'0.1 n'
len(d): 10435, match.start(): 806, match.end(): 809, len(bytearray(val)): 5
Traceback (most recent call last):
  File "fixV1.py", line 190, in <module>
    main(sys.argv)
  File "fixV1.py", line 136, in main
    nchanges += search(midfile)     # perform search, returning count
  File "fixV1.py", line 71, in search
    d[match.start():match.end()] = bytearray(val)
BufferError: Existing exports of data: object cannot be re-sized
rdtsc
  • 1,044
  • 10
  • 17
  • what are the values of `len(d)`, `match.start()`, `match.end()` and `len(bytearray(val))`? – zariiii9003 Jun 09 '20 at 22:52
  • what is this part of regex `0.?\d*` mean to you ? –  Jun 09 '20 at 23:45
  • RegEx `0.?\d* [nN]` means "data starts with a `0`, has an optional `.`, and 0 or more digits. Then is followed by a " " character, and either an `n` or `N`." It appears to be matching correctly in all cases. – rdtsc Jun 10 '20 at 12:32

1 Answers1

2

This is a simple case, much like modifying an iterable during iteration:

  • it = pat.finditer(d) creates a buffer from the bytearray object. This in turn "locks" the bytearray object from being changed in size.
  • d[match.start():match.end()] = bytearray(val) attempts to modify the size on the "locked" bytearray object.

Just like attempting to change a list's size while iterating over it will fail, an attempt to change a bytearray size while iterating over it's buffer will also fail.

You can give a copy of the object to finditer().

For more information about buffers and how Python works under the hood, see the Python docs.


Also, do keep in mind, you're not actually modifying the file. You'll nee to either write the data back to the file, or use memory mapped files. I suggest the latter if you're looking for efficiency.

Bharel
  • 23,672
  • 5
  • 40
  • 80
  • The idea was (at first) to open the file, read it (into [d]), modify it by adding bytes, then write it back. But this fails on the slicing-insert, not on the file save. In the docs for MMAP, it reads "# note that new content must have same size." I guess the question becomes, can [d] be created as non-locked? – rdtsc Aug 26 '20 at 12:42
  • 1
    @rdtsc length can be changed. Yes, it fails on the slice-insert cause it's locked. The only way to do this is by creating a copy of the data, either immediately or lazily.Try doing `it = pat.finditer(d.copy())` you'll see it'll work. – Bharel Aug 26 '20 at 21:08