1

I'm trying to use msgpack to write a list of dictionaries to a file. However, when I iterate over an instance of Unpacker, it seems like the number 10 is unpacked between each 'real' document.

The test script I'm running is

import msgpack
from faker import Faker
import logging
from logging.handlers import RotatingFileHandler

fake = Faker()
fake.seed(0)

data_file = "my_log.log"

logger = logging.getLogger('my_logger')
logger.setLevel(logging.DEBUG)
handler = RotatingFileHandler(data_file, maxBytes=2000, backupCount=10)
logger.addHandler(handler)

fake_dicts = [{'name': fake.name()} for _ in range(100)]

for item in fake_dicts:
    dump_string = msgpack.packb(item)
    logger.debug(dump_string)

unpacker = msgpack.Unpacker(open(data_file))

for unpacked in unpacker:
    print unpacked

where I've used fake-factory to generate fake data. The resulting printed output is as follows:

{'name': 'Joshua Carter'}
10
{'name': 'David Williams'}
10
{'name': 'Joseph Jones'}
10
{'name': 'Gary Perry'}
10
{'name': 'Terry Wells'}
10
{'name': 'Vanessa Cooper'}
10
{'name': 'Michael Simmons'}
10
{'name': 'Nicholas Kline'}
10
{'name': 'Lori Bennett'}
10

I don't understand why the number 10 is printed between each dictionary? Is this somehow introduced by the logger?

Kurt Peek
  • 52,165
  • 91
  • 301
  • 526
  • 2
    My first thought is it is converting a line feed (Unicode 10) to an integer. Try `print msgpack.packb(item)` to print it directly, see if it's being introduced there or in the Unpacker. – K Richardson Oct 19 '16 at 13:35
  • It seems indeed that it is coming from the newline character introduced by the logger. I'm going to try to use `handler.terminator = ""` following http://stackoverflow.com/questions/7168790/suppress-newline-in-python-logging-module (after upgrading to Python 3). – Kurt Peek Oct 19 '16 at 13:45

1 Answers1

2

This is coming from the contents of unpacker. You can replicate yourself like this:

In [23]: unpacker = msgpack.Unpacker(open(data_file))

In [24]: unpacker.next()
Out[24]: {'name': 'Edward Ruiz'}

In [25]: unpacker.next()
Out[25]: 10
Noah Gift
  • 256
  • 1
  • 4
  • 9
  • Any ideas on how I can prevent the `10`s from entering the `msgpack` in the first place? – Kurt Peek Oct 19 '16 at 16:01
  • I haven't used msgpack before, but I briefly looked at the spec here: https://github.com/msgpack/msgpack/blob/master/spec.md. This seems off first glance to be expected behavior in 30 seconds of looking. – Noah Gift Oct 19 '16 at 17:08
  • Potentially a generator expression would handle this nicely: http://www.dabeaz.com/generators/ – Noah Gift Oct 19 '16 at 17:10
  • I'm pretty sure the newline characters come from the `logger`, because in Python 3 in a version of the script using (human-readable) JSON instead of msgpack I can see the effect of the `handler.terminator = ""` command. – Kurt Peek Oct 19 '16 at 17:20
  • Maybe disable logging: http://stackoverflow.com/questions/2266646/how-to-i-disable-and-re-enable-console-logging-in-python – Noah Gift Oct 19 '16 at 17:54