0

I have a one dimensional numpy array. Everything in it should be integer multiples of 10. I need to search through it, and find anywhere where my integer multiples are greater than 10, i.e. 20, 30, etc. And when I find one I need to be able to identify the value of the indices where the break occurs and use them for some processing. I also need to return individual values that are isolated and do something different with them. For example,

0, 10, 20, 60, 80, 90, 100

From 0 to 20 is one set, so I need to get back the 0 and 20 and do something, and likewise the 80 to 100 are a set and the same should occur. The 60 though is just a value on it's own, so I need to see that it's just an isolated single value and do some separate processing for it, and then resume at 80 to get the right result.

Been trying to figure out a reasonable way to do this with numpy and haven't come up with much. The datasets are very large, so the more efficient the better. There should in theory be no duplication of value within sets and they should always progress as we move forward through the array. Thanks for any help in advance.

Will
  • 677
  • 3
  • 11
  • 21
  • 1
    I'd start with `np.diff` to find the pairwise difference between adjacent samples, followed by logical indexing to find where the differences are big enough to split the array into subarrays. See how far you can get with `np.diff` and update the question with code if you run into problems. – Ahmed Fasih May 10 '18 at 00:28
  • Possible duplicate of [how to find the groups of consecutive elements from an array in numpy?](https://stackoverflow.com/questions/7352684/how-to-find-the-groups-of-consecutive-elements-from-an-array-in-numpy) – AGN Gazer May 10 '18 at 02:08
  • Best answer: https://stackoverflow.com/a/7353335/8033585 : `np.split(data, np.where(np.diff(data) != stepsize)[0]+1)` – AGN Gazer May 10 '18 at 02:11

1 Answers1

1

You can certainly group the values into the required sets. There are couple of ways of doing it, e.g. a closure:

import itertools as it

def gn(init=0):
    _x = init
    _count = 0
    def fn(x):
        nonlocal _count, _x
        _count += x - _x > 10
        _x = x
        return _count
    return fn

In []:
x = [0, 10, 20, 60, 80, 90, 100]
[list(g) for k, g in it.groupby(x, gn())]

Out[]:
[[0, 10, 20], [60], [80, 90, 100]]

Or similarly with a class:

class G:
    def __init__(self, init=0):
        self._x = init
        self._count = 0
    def __call__(self, x):
        self._count += x - self._x > 10
        self._x = x
        return self._count

In []:
x = [0, 10, 20, 60, 80, 90, 100]
[list(g) for k, g in it.groupby(x, G())]

Out[]:
[[0, 10, 20], [60], [80, 90, 100]]

Then you can just iterate of these groups and if then len() == 1 do something different.

AChampion
  • 29,683
  • 4
  • 59
  • 75
  • This method sounds best, however when I run it it tells me: ` File "test.py", line 21 nonlocal _count, _x ^ SyntaxError: invalid syntax' – Will May 10 '18 at 14:20
  • `nonlocal` was only added in Py3. You must be using Py2.X. Just use the `class`. – AChampion May 10 '18 at 15:23
  • Not sure about the error, but got it to work as a class, so I went that route. It's very fast, thank you! – Will May 10 '18 at 17:14