4

I need a way to convert 20 million 32 and 64-bit integers into corresponding bit arrays (so this has to be memory/time efficient). Under advice from a different question/answer here on SO, I'm attempting to do this by using numpy.unpackbits. While experimenting with this method I ran into unexpected results:

np.unpackbits(np.array([1], dtype=np.uint64).view(np.uint8))

produces:

array([0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=uint8)

I would expect the 1 element to be the last one, but not in the middle. So I'm obviously missing something that preserves the byte order. What am I missing?

Community
  • 1
  • 1
Dmitry B.
  • 9,107
  • 3
  • 43
  • 64
  • I didn't see documentation proving this, but I assumed that when I create an array of type int64 and populate it with data smaller in size, every element would be cast into a long. I.e. an equivalent of cast in C, which should pad higher order bits with `0`s. – Dmitry B. May 12 '16 at 16:19
  • 20 million! But I hope not hand edited O_o – linusg May 12 '16 at 16:20
  • 2
    [This answer](http://stackoverflow.com/a/18296281/8747) suggests you read [this link](http://docs.scipy.org/doc/numpy/user/basics.byteswapping.html). – Robᵩ May 12 '16 at 16:21
  • @StevenRumbalski: I can't. `np.unpackbits` expects a byte array – Dmitry B. May 12 '16 at 16:22
  • @Robᵩ: that's the question/answer I got the idea from. I did look at that link and I didn't think it applied here because my data (`1`) is generated on the same computer that is running the python process. I can take another look for other cues. – Dmitry B. May 12 '16 at 16:25

1 Answers1

7

Try: dtype='>i8', like so:

In [6]: np.unpackbits(np.array([1], dtype='>i8').view(np.uint8))
Out[6]: 
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1], dtype=uint8)

Reference:

http://docs.scipy.org/doc/numpy/user/basics.byteswapping.html

Robᵩ
  • 163,533
  • 20
  • 239
  • 308
  • D'oh!. Ok, I've should've read past the first couple paragraphs. – Dmitry B. May 12 '16 at 16:28
  • Though I still don't understand why I have to do this in my case since the data is generated and consumed under the same memory architecture. – Dmitry B. May 12 '16 at 16:31
  • 2
    Because your PC is little-endian, and you've asked for a big-endian representation. – Robᵩ May 12 '16 at 16:31
  • "you've asked for a big-endian representation" My initial code doesn't make any byte order requests. Does `view` by default assume big-endiannes of the underlying data? I thought `view(np.uint8)` resulted in simply byte-by-byte read of memory, which would mean that the data there was already in big-endian order (so what imposed that order on a little-endian system). – Dmitry B. May 12 '16 at 19:56
  • 2
    I mean in your question, *you*, not your program, asked for a big-endian representation. The representation you objected to was little-endian. – Robᵩ May 12 '16 at 20:03