4

I'm trying to convert an array of integers into their binary representations in python. I know that native python has a function called bin that does this. Numpy also has a similar function: numpy.binary_repr.

The problem is that none of these are vectorized approaches, as in, they only take one single value at a time. So, in order for me to convert a whole array of inputs, I have to use a for-loop and call these functions multiple times, which isn't very efficient.

Is there any way to perform this conversion without for-loops? Are there any vectorized forms of these functions? I've tried numpy.apply_along_axis but no luck. I've also tried using np.fromiter and map and it was also a no go.

I know similar questions have been asked a few other times (like here), but none of the answers given are actually vectorized.

Pointing me into any direction would be greatly appreciated!

Thanks =)

rafaelc
  • 57,686
  • 15
  • 58
  • 82
Felipe D.
  • 1,157
  • 9
  • 19
  • Just out of curiosity, how big is your data? Just ran simple `[np.binary_repr(z) for z in x]` list comprehension for 1MM items and took `1.41 s ± 182 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)` – rafaelc Jul 23 '18 at 03:20
  • The dataset's I'll be using will likely not be that large, usually in the order of 1,000, maybe 10,000. But I'm going to have to evaluate an objective function multiple times for optimization, so I'm trying to shave off some processing time every single corner I find. – Felipe D. Jul 23 '18 at 03:31
  • How many bits are your integers, and are they signed or unsigned? – Warren Weckesser Jul 23 '18 at 04:38
  • Unsigned integers - I'll only be working with positive ints. The largest int I'll have to convert is probably 10,000. So probably 16 bits would do just fine. – Felipe D. Jul 23 '18 at 04:41
  • Take a look at the answers to ["Convert integer to binary array with suitable padding"](https://stackoverflow.com/questions/22227595/convert-integer-to-binary-array-with-suitable-padding) – Warren Weckesser Jul 23 '18 at 04:43

2 Answers2

2

The easiest way is to use binary_repr with vectorize, it will preserve the original array shape, e.g.:

binary_repr_v = np.vectorize(np.binary_repr)
x = np.arange(-9, 21).reshape(3, 2, 5)
print(x)
print()
print(binary_repr_v(x, 8))

The output:

[[[-9 -8 -7 -6 -5]
  [-4 -3 -2 -1  0]]

 [[ 1  2  3  4  5]
  [ 6  7  8  9 10]]

 [[11 12 13 14 15]
  [16 17 18 19 20]]]

[[['11110111' '11111000' '11111001' '11111010' '11111011']
  ['11111100' '11111101' '11111110' '11111111' '00000000']]

 [['00000001' '00000010' '00000011' '00000100' '00000101']
  ['00000110' '00000111' '00001000' '00001001' '00001010']]

 [['00001011' '00001100' '00001101' '00001110' '00001111']
  ['00010000' '00010001' '00010010' '00010011' '00010100']]]
aparpara
  • 2,171
  • 8
  • 23
2

The quickest way I've found (so far) is to use the pd.Series.apply() function.

Here are the testing results:

import pandas as pd
import numpy as np

x = np.random.randint(1,10000000,1000000)

# Fastest method
%timeit pd.Series(x).apply(bin)
# 135 ms ± 539 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

# rafaelc's method
%timeit [np.binary_repr(z) for z in x]
# 725 ms ± 5.31 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

# aparpara's method
binary_repr_v = np.vectorize(np.binary_repr)
%timeit binary_repr_v(x, 8)
# 7.46 s ± 24.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Felipe D.
  • 1,157
  • 9
  • 19