1

I would like to lookup a vector in array. The array consists of three dimensional vectors. I would like to get a boolean value if I find it or not

For set/dictionary in Python 3, the code is like below,

class Unit(object):
    def __init__(self, idx_vertex1, idx_vertex2, weight):
        self._idx_vertex1 = idx_vertex1
        self._idx_vertex2 = idx_vertex2

    def __eq__(self, that):
        if not isinstance(that, Edge):
            return False
        info_that = that.get_info()
        return (self._idx_vertex1 == info_that[0] 
                and self._idx_vertex2 == info_that[1])

    def __hash__(self):
        idx_sort = sorted([self._idx_vertex1, self._idx_vertex2])
        return ((idx_sort[0]+idx_sort[1])*(idx_sort[0]+idx_sort[1]+1)//2
            +idx_sort[1])

    def get_info(self):
        return [self._idx_vertex1, self._idx_vertex2]

    def main():
        units = set()
        for elem in random.sample(range(3070055), 3070055):
            units.add(Unit(elem, elem+1, 3))

For Numpy, the code seems to be simpler,

test_np = np.random.randint(3070055, size=(3070055, 3))

However, as for the speed, both have large difference:

>>>if Unit(44, 45, 5) in units:
...       return True
Lookup CPU Time: 0.00010633468627929688
>>>np.isin([303,5,822],test_np)
Lookup CPU Time: 1.5214931964874268

Is there anyway to speed up Numpy performance? Thanks!

dinex
  • 367
  • 5
  • 15
  • 2
    Numpy works with arrays! This is mainly a data structures question, sets are optimized for look up task. You should use the structure that better suits your needs :) – m33n Jul 06 '18 at 07:30
  • [Related question](https://stackoverflow.com/a/50881584/9209546), I know it's about Pandas but the underlying functionality and solution utilize NumPy / Cython. It's messy and may not be the road you want to go down. There's no in-built optimization, as far as I know. – jpp Jul 06 '18 at 08:58

0 Answers0