I have a BytesIO buffer into which is being written the contents of a file in S3. That file has pybson BSON objects written to it, separated by \n
characters, i.e. binary characters separated by new lines.
I want to parse the event objects in the file. I am iterating through each line like this:
def iter_event(data: BytesIO):
for line in data:
yield bson.loads(line)
I am finding that there seem to be some rogue characters being injected or corrupted at the end of the line
variable in some cases and my code is failing with the same exception as mentioned briefly in one of the comments in this SO question. When I look at the file using a binary editor I cannot see the rogue character, it seems to only occur in the line
variable. (For what it's worth, the end of the BSON object looks like \x00\x00\n
in a binary editor and my line variable ends in \x00\x10e\n
.
Is there an issue with iterating through each line like this? If not, what's a better approach please?