0

If I sum through an array of 0 and 1 , I get a different result doing the same thing through numpy array. Why is that happening and what is the solution? The code is given below:

vl_2=vl_1=0
string_1="00001000100111000010001001100001000100110000100010011000010001011100001"
sb=string_1
table = bytearray.maketrans(b'01', b'\x00\x01')
X     = bytearray(sb, "ascii").translate(table)
Y=2.**(np.nonzero(X)[0]+1)#X=np.nonzero(sb)[0]
for i in range(len(sb)): 
                    vl_1 = vl_1+X[i]*2**(i+1)
for y in np.nditer(Y)  :
                    vl_2=vl_2+y

Note that I am doing the same math operation I both loop and so vl_2==vl_1 should be True, but I get False.

Edit:

  1. This problem occurred in a vectorized code, so speed is an issue, any solution given should consider that. So, the solution should be related to numpy rather than other time-consuming solution.
Michael
  • 191
  • 3
  • 16
  • `x` on the last line (`vl_2=vl_2+x`) is not defined. – paime Jul 04 '22 at 13:34
  • Note that with numpy you have: `2 ** (np.array(62) + 1) == -9223372036854775808` and `2 ** (np.array(63) + 1) == 0` etc., due to 64 bits limitations. You don't have it with native python `int`. – paime Jul 04 '22 at 14:01
  • Don't use `nditer`. It isn't needed to iterate on `Y`, and doesn't have any benefits (especially with this simple use), – hpaulj Jul 04 '22 at 14:07
  • @paime corrected (your first comment); about your second comment, I am using numpy for vectorization, so is there a way to increase the capability inside numpy array? – Michael Jul 04 '22 at 14:26
  • Iterating on an array is not "vectorization". It's actually slower than iterating on a list. – hpaulj Jul 04 '22 at 14:39
  • @hpaulj I have not presented the whole code here, only the problematic part is posted. – Michael Jul 04 '22 at 14:41
  • @Michael You need to use `dtype=object`, I posted it as an answer. – paime Jul 04 '22 at 14:49

3 Answers3

0

The loop over np.nditer(Y) is using scientific notations that throws off the calculations a little bit. I changed the loop a little bit

vl_2_2 = 0
for y in np.nditer(Y):
    vl_2 = vl_2 + y
    vl_2_2 = vl_2_2 + int(y.item())
    print(f'{vl_2} {int(vl_2)} {vl_2_2}')

vl_2 is the original

vl_2_2 is doing the calculations after converting y to an int

In the printout I also print vl_2 as an int after the calculation.

The results are the same in both loops up to the point of the conversion to scientific notations

First loop (without duplicates):

32
544
4640
12832
29216
553504
8942112
76050976
210268704
4505236000
73224712736
622980526624
1722492154400
36906864243232
599856817664544
5103456445035040
14110655699776032
302341031851487776
4914027050278875680
23360771123988427296
60254259271407530528
134041235566245736992
2495224477001068343840

Second loop (look at the first number for the original)

32.0 32 32
544.0 544 544
4640.0 4640 4640
12832.0 12832 12832
29216.0 29216 29216
553504.0 553504 553504
8942112.0 8942112 8942112
76050976.0 76050976 76050976
210268704.0 210268704 210268704
4505236000.0 4505236000 4505236000
73224712736.0 73224712736 73224712736
622980526624.0 622980526624 622980526624
1722492154400.0 1722492154400 1722492154400
36906864243232.0 36906864243232 36906864243232
599856817664544.0 599856817664544 599856817664544
5103456445035040.0 5103456445035040 5103456445035040
1.4110655699776032e+16 14110655699776032 14110655699776032
3.0234103185148774e+17 302341031851487744 302341031851487776
4.914027050278875e+18 4914027050278875136 4914027050278875680
2.3360771123988427e+19 23360771123988426752 23360771123988427296
6.025425927140753e+19 60254259271407534080 60254259271407530528
1.3404123556624574e+20 134041235566245740544 134041235566245736992
2.4952244770010683e+21 2495224477001068314624 2495224477001068343840
Guy
  • 46,488
  • 10
  • 44
  • 88
0

With your setup - I like to see some values, not just a vague "not the same" claim.

In [70]: Y
Out[70]: 
array([3.20000000e+01, 5.12000000e+02, 4.09600000e+03, 8.19200000e+03,
       1.63840000e+04, 5.24288000e+05, 8.38860800e+06, 6.71088640e+07,
       1.34217728e+08, 4.29496730e+09, 6.87194767e+10, 5.49755814e+11,
       1.09951163e+12, 3.51843721e+13, 5.62949953e+14, 4.50359963e+15,
       9.00719925e+15, 2.88230376e+17, 4.61168602e+18, 1.84467441e+19,
       3.68934881e+19, 7.37869763e+19, 2.36118324e+21])


In [72]: X
Out[72]: bytearray(b'\x00\x00\x00\x00\x01\x00\x00\x00\x01\x00\x00\x01\x01\x01\x00\x00\x00\x00\x01\x00\x00\x00\x01\x00\x00\x01\x01\x00\x00\x00\x00\x01\x00\x00\x00\x01\x00\x00\x01\x01\x00\x00\x00\x00\x01\x00\x00\x00\x01\x00\x00\x01\x01\x00\x00\x00\x00\x01\x00\x00\x00\x01\x00\x01\x01\x01\x00\x00\x00\x00\x01')

In [73]: for i in range(len(sb)): 
    ...:                     vl_1 = vl_1+X[i]*2**(i+1)
    ...:                     

In [74]: vl_1
Out[74]: 2495224477001068343840


In [76]: for y in np.nditer(Y)  :
    ...:                     vl_2=vl_2+y
    ...:                     

In [77]: vl_2
Out[77]: 2.4952244770010683e+21

One is float (after all Y is float), but otherwise the values are the same (within float precision)

In [78]: vl_1-vl_2
Out[78]: 0.0

nditer does nothing for you:

In [79]: vl_2=0
    ...: for y in Y  : vl_2=vl_2+y

In [80]: vl_2
Out[80]: 2.4952244770010683e+21

but iterating on arrays is slower. You don't need it

In [81]: np.sum(Y)
Out[81]: 2.4952244770010683e+21

edit

If you replace 2. with 2 when constructing Y:

In [95]: 2.**(np.nonzero(X)[0]+1)
Out[95]: 
array([3.20000000e+01, 5.12000000e+02, 4.09600000e+03, 8.19200000e+03,
       1.63840000e+04, 5.24288000e+05, 8.38860800e+06, 6.71088640e+07,
       1.34217728e+08, 4.29496730e+09, 6.87194767e+10, 5.49755814e+11,
       1.09951163e+12, 3.51843721e+13, 5.62949953e+14, 4.50359963e+15,
       9.00719925e+15, 2.88230376e+17, 4.61168602e+18, 1.84467441e+19,
       3.68934881e+19, 7.37869763e+19, 2.36118324e+21])

In [96]: 2**(np.nonzero(X)[0]+1)
Out[96]: 
array([                 32,                 512,                4096,
                      8192,               16384,              524288,
                   8388608,            67108864,           134217728,
                4294967296,         68719476736,        549755813888,
             1099511627776,      35184372088832,     562949953421312,
          4503599627370496,    9007199254740992,  288230376151711744,
       4611686018427387904,                   0,                   0,
                         0,                   0], dtype=int64)

The second is integer values, but the last 4 are too large for int64.

Skipping the last part of X I get the same integer result:

In [100]: sum(2**(np.nonzero(X[:-8])[0]+1))
Out[100]: 4914027050278875680

In [101]: sum([x*2**(i+1) for i,x in enumerate(X[:-8])])
Out[101]: 4914027050278875680

The other answer suggested going with object dtype. While it may work, it looses most of the speed advantages of working with numeric dtype arrays.

object speed

As proposed in the other answer, converting the nonzero results to object, produces the large enough Python ints:

In [166]: 2**(np.nonzero(X)[0]+1).astype(object)
Out[166]: 
array([32, 512, 4096, 8192, 16384, 524288, 8388608, 67108864, 134217728,
       4294967296, 68719476736, 549755813888, 1099511627776,
       35184372088832, 562949953421312, 4503599627370496,
       9007199254740992, 288230376151711744, 4611686018427387904,
       18446744073709551616, 36893488147419103232, 73786976294838206464,
       2361183241434822606848], dtype=object)

Some comparative times

In [167]: timeit np.sum(2**(np.nonzero(X)[0]+1).astype(object))
46.5 µs ± 1.64 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

The approximate float approach:

In [168]: timeit np.sum(2.**(np.nonzero(X)[0]+1))
32.3 µs ± 1.12 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

The enumerated list:

In [169]: timeit sum([x*2**(i+1) for i,x in enumerate(X)])
43.1 µs ± 1.14 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

Working with an object dtype array doesn't help, speedwise.

The list version of nonzero_bits is even faster

In [173]: %%timeit
     ...: nonzero_bits = [i for i, x in enumerate(X) if x != 0]
     ...: vl = sum(2 ** (i + 1) for i in nonzero_bits)
18.9 µs ± 221 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
hpaulj
  • 221,503
  • 14
  • 230
  • 353
  • in line 78, do `vl_2==vl_1` it gives `False`. I gor the following output: ` 2.4952244770010683e+21 2495224477001068343840 False` – Michael Jul 04 '22 at 14:54
  • How about `vl_2==float(vl_1)`? I looked more at your `Y`. The last few elements are too large to represent with numpy ints. And floats are also an approximation. With these large values, individually and summed, you'll have to stick with Python ints, or accept the difference. – hpaulj Jul 04 '22 at 15:05
  • That misses the point of the code, bot is supposed to be the same, i.e. the large value should be also accurate, approximation produces wrong output (the problem arises in the main and larger code). Id there any other package that will allow doing vectorization without the problem I posted? – Michael Jul 04 '22 at 15:09
  • In `numpy` "vectorization" means using the compiled numpy methods. It gains speed because the arrays have compact storage with straightforward C level iteration. To encode those large values of `Y`, you need extended range `ints`, which require a more flexible storage (more than 8 bytes per value). – hpaulj Jul 04 '22 at 15:19
  • Is there a way/instruction/method to extended range of `ints`? How I do that? – Michael Jul 04 '22 at 15:24
  • You found it - using Python `int`. The fast compiled numpy array dtypes and methods do not. – hpaulj Jul 04 '22 at 16:00
0

First, using numpy to still use a for loop is not vectorization and will not improve performance (will be even worse, because of numpy array instanciation overhead).

Second, you're handling very large number, above numpy's native ctypes capacities, but native python int can handle them, so you need to specify dtype=object for numpy not to cast types (see https://stackoverflow.com/a/37272717/13636407).

Even there, because using dtype=object, numpy can't vectorize, so there is no performance improvement using numpy, as @hpaulj noticed (see performance tests below).

import numpy as np

def using_list(s):
    X = to_bytearray(s)
    nonzero_bits = [i for i, x in enumerate(X) if x != 0]
    return sum(2 ** (i + 1) for i in nonzero_bits)

def using_numpy(s):
    # because large numbers, need to convert to dtype=object
    # see https://stackoverflow.com/a/37272717/13636407
    X = to_bytearray(s)
    nonzero_bits = np.nonzero(X)[0].astype(object)
    return np.sum(2 ** (nonzero_bits + 1))

table = bytearray.maketrans(b"01", b"\x00\x01")

def to_bytearray(s):
    return bytearray(s, "ascii").translate(table)

Equality check:

s = "00001000100111000010001001100001000100110000100010011000010001011100001"

vl_list = using_list(s)
vl_numpy = using_numpy(s)

assert vl_list == vl_numpy

Performance tests:

>>> %timeit using_list(s)
... %timeit using_numpy(s)
... print()
... %timeit using_list(s * 10)
... %timeit using_numpy(s * 10)
... print()
... %timeit using_list(s * 100)
... %timeit using_numpy(s * 100)

10.1 µs ± 81 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
18.1 µs ± 146 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

128 µs ± 700 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
104 µs ± 605 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

9.88 ms ± 14.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
9.77 ms ± 108 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
paime
  • 2,901
  • 1
  • 6
  • 17