The reason for this increased number of bytes is how BSON saves the data. You can find this information in the BSON specification, but let's look at a concrete example:
import numpy as np
import bson
npdata = np.arange(5, dtype='B') * 11
listdata = npdata.tolist()
bsondata = bson.BSON.encode({"rows": rows, "cols": cols, "data": listdata})
print([hex(b) for b in bsondata])
Here, we store an array with values [0, 11, 22, 33, 44, 55]
as BSON and print the resulting binary data. Below I have annotated the result to explain what's going on:
['0x47', '0x0', '0x0', '0x0', # total number of bytes in the document
# First element in document
'0x4', # Array
'0x64', '0x61', '0x74', '0x61', '0x0', # key: "data"
# subdocument (data array)
'0x4b', '0x0', '0x0', '0x0', # total number of bytes
# first element in data array
'0x10', # 32 bit integer
'0x30', '0x0', # key: "0"
'0x0', '0x0', '0x0', '0x0', # value: 0
# second element in data array
'0x10', # 32 bit integer
'0x31', '0x0', # key: "1"
'0xb', '0x0', '0x0', '0x0', # value: 11
# third element in data array
'0x10', # 32 bit integer
'0x32', '0x0', # key: "2"
'0x16', '0x0', '0x0', '0x0', # value: 22
# ...
]
In addition to some format overhead, each value of the array is rather wastefully encoded with 7 bytes: 1 byte to specify the data type, 2 bytes for a string containing the index (three bytes for indices >=10, four bytes for indices >=100, ...) and 4 bytes for the 32 bit integer value.
This at least explains why the BSON data is so much bigger than the original array.
I found two libraries GitHub - mongodb/bson-numpy and GitHub - ajdavis/bson-numpy which may do a better job of encoding numby arrays in BSON. However, I did not try them, so I can't say if that is the case or if they even work correctly.