5

I'm working with 64 bit unsigned integers and after bit shifting comparing the value before decoding the rest of the bit values. I'm iterating over millions of values and trying to minimize process time.

The issue is bit shifting is not supported with uint64 nor numpy-uint64. I'm trying to avoid using int64 to avoid negative values.

example data: 0x8204000000000080 after shifting(word>> 60): =-8 #but comparing to 0x8

looping one million times and seeing how long it takes it was found that of all methods the '>>' shift operator was the most expedient with the next best option to call the abs() function. Is there a better more expedient solution for this?

Loop code:

import numpy as np
import time

start_time= time.time()
for i in range(1000000):
    x= np.int64(-1)
    x=np.right_shift(x,60)
print (time.time()-start_time)

start_time= time.time()
for i in range(1000000):
    x= np.uint64(-1)
    x=int(x/(2**60))
print (time.time()-start_time)

start_time= time.time()
for i in range(1000000):
    x= np.int64(-1)
    x=abs(x>>60)
print (time.time()-start_time)

start_time= time.time()
for i in range(1000000):
    x= np.int64(-1)
    x= x>>60
print (time.time()-start_time)

Output:

2.055999994277954
3.1540000438690186
0.619999885559082
0.5810000896453857
Paul H
  • 65,268
  • 20
  • 159
  • 136
kaminsknator
  • 1,135
  • 3
  • 15
  • 26

2 Answers2

18

The issue is that when you apply the shift to an array scalar, NumPy tries to produce an output type that can hold all values of both input dtypes (with a Python int cast to either int32 or int64). There is no integer dtype that can hold all values of both uint64 and a signed dtype, and floats aren't an option here.

When one operand is an array and the other is a scalar (here a Python int), NumPy attempts to stuff the scalar into a smaller dtype, which for most shift operations means the shift amount is cast to int8 or uint8, depending on whether the other operand is signed. uint64 and uint8 both fit in uint64.

You'll have to cast the shift amount to an unsigned int:

>>> numpy.uint64(-1) >> 1
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: ufunc 'right_shift' not supported for the input types, and the inputs
 could not be safely coerced to any supported types according to the casting rul
e ''safe''
>>> numpy.uint64(-1) >> numpy.uint64(1)
9223372036854775807
user2357112
  • 260,549
  • 28
  • 431
  • 505
  • ahh good stuff ... not sure why it worked fine for me in older python and older numpy :P (+1 though since you could reproduce original issue) – Joran Beasley May 28 '15 at 18:19
  • 1
    Is this the same condition then for all operations with the array scalar? for bitwise & it would be necessary to: data_word & np.uint64(mask) Thanks for your feedback it answers my question. – kaminsknator May 28 '15 at 18:29
2
>>> import numpy
>>> a = numpy.array([1,2,3],dtype=numpy.uint64)
>>> a>>2
array([0, 0, 0], dtype=uint64)

>>> a = numpy.array([1,2,2**64-1],dtype=numpy.uint64)
>>> a>>2 
array([0, 0, 4611686018427387903], dtype=uint64)
>>> a>>60
array([ 0,  0, 15], dtype=uint64)

I dont understand the problem perhaps?

Joran Beasley
  • 110,522
  • 12
  • 160
  • 179
  • 1
    Perhaps a typo. The uint64 type seems to be preserved in the response. – dlask May 28 '15 at 18:03
  • 1
    in the output it looks like it converted your data type to int64. My data can't be converted to int64 as it would lose information. The largest possible positive number with int64 is 2^63-1 while the largest possible positive number with uint64 is 2^64 I believe. My data contains numbers in this range. – kaminsknator May 28 '15 at 18:05
  • it didnt it was a typo its still uint64 (I initially used int64 because I misread the original statement and didnt correct the output when I changed to uint) – Joran Beasley May 28 '15 at 18:06
  • The error: TypeError: ufunc 'right_shift' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe'' type: operation that causes the error: (data_word>>60) – kaminsknator May 28 '15 at 18:08
  • hmmm what version of python are you using? it seems fine to me in python2.6 and numpy 1.9.0 – Joran Beasley May 28 '15 at 18:11
  • which leads me to believe the error is actually with how you are creating the array – Joran Beasley May 28 '15 at 18:12
  • using python3.4: array is made from:data = np.fromfile("ddla_dat.dat", np.dtype('uint64'), -1) – kaminsknator May 28 '15 at 18:12
  • well that certainly looks right ... can you repeat it with a small file? say 10 15 entries and give a link to download the dat file? – Joran Beasley May 28 '15 at 18:17