Split NumPy array according to values in the array (a condition)

Question

I have an array:

    arr = [(1,1,1), (1,1,2), (1,1,3), (1,1,4)...(35,1,22),(35,1,23)]

I want to split my array according to the third value in each ordered pair. I want each third value of 1 to be the start of a new array. The results should be:

    [(1,1,1), (1,1,2),...(1,1,35)][(1,2,1), (1,2,2),...(1,2,46)]

and so on. I know numpy.split should do the trick but I'm lost as to how to write the condition for the split.

To clarify: Every time the 3rd component is a 1 you want to split and have that be the start of a new array? — emschorsch, Jul 17 '15 at 20:15

hpaulj · Answer 1 · 2015-07-17T20:45:13.713

3

Here's a quick idea, working with a 1d array. It can be easily extended to work with your 2d array:

In [385]: x=np.arange(10)

In [386]: I=np.where(x%3==0)

In [387]: I
Out[387]: (array([0, 3, 6, 9]),)

In [389]: np.split(x,I[0])
Out[389]: 
[array([], dtype=float64),
 array([0, 1, 2]),
 array([3, 4, 5]),
 array([6, 7, 8]),
 array([9])]

The key is to use where to find the indecies where you want split to act.

For a 2d arr

First make a sample 2d array, with something interesting in the 3rd column:

In [390]: arr=np.ones((10,3))
In [391]: arr[:,2]=np.arange(10)
In [392]: arr
Out[392]: 
array([[ 1.,  1.,  0.],
       [ 1.,  1.,  1.],
       ...
       [ 1.,  1.,  9.]])

Then use the same where and boolean to find indexes to split on:

In [393]: I=np.where(arr[:,2]%3==0)

In [395]: np.split(arr,I[0])
Out[395]: 
[array([], dtype=float64),
 array([[ 1.,  1.,  0.],
       [ 1.,  1.,  1.],
       [ 1.,  1.,  2.]]),
 array([[ 1.,  1.,  3.],
       [ 1.,  1.,  4.],
       [ 1.,  1.,  5.]]),
 array([[ 1.,  1.,  6.],
       [ 1.,  1.,  7.],
       [ 1.,  1.,  8.]]),
 array([[ 1.,  1.,  9.]])]

edited Jul 17 '15 at 20:45

answered Jul 17 '15 at 20:24

hpaulj

221,503
14
230
353

This probably is a dumb question, but could you explain the meaning of the In [390],...Out[392]? – whent1991 Jul 17 '15 at 20:30
I'm just constructing a sample 2d array with an interesting 3rd column. You didn't give us a real sample! – hpaulj Jul 17 '15 at 20:45
im not sure if this is what you are asking about, but @hpaulj is using an ipython notebook, which lets you enter a line of code and run it immediately. The stuff you type in is In[number]. If a function returns something and you do not assign to a variable, ipython prints it out. – rohanp Jul 17 '15 at 21:17
This should be the accepted answer. Using an external for-loop is not really the way to go with numpy. – Dux Jul 18 '15 at 16:52

score 0 · Accepted Answer · answered Jul 17 '15 at 20:21

0

I cannot think of any numpy functions or tricks to do this . A simple solution using for loop would be -

In [48]: arr = [(1,1,1), (1,1,2), (1,1,3), (1,1,4),(1,2,1),(1,2,2),(1,2,3),(1,3,1),(1,3,2),(1,3,3),(1,3,4),(1,3,5)]

In [49]: result = []

In [50]: for i in arr:
   ....:     if i[2] == 1:
   ....:         tempres = []
   ....:         result.append(tempres)
   ....:     tempres.append(i)
   ....:

In [51]: result
Out[51]:
[[(1, 1, 1), (1, 1, 2), (1, 1, 3), (1, 1, 4)],
 [(1, 2, 1), (1, 2, 2), (1, 2, 3)],
 [(1, 3, 1), (1, 3, 2), (1, 3, 3), (1, 3, 4), (1, 3, 5)]]

answered Jul 17 '15 at 20:21

Anand S Kumar

88,551
18
188
176

Apparently line 1 has invalid syntax and is unable to run, according to Python – whent1991 Jul 17 '15 at 20:40
do not copy the whole `In [48]:` things, those are my ipython prompts, just copy the portion from `arr = `. – Anand S Kumar Jul 17 '15 at 20:45
I want to run a code on each of the new arrays - does Python automatically run the code through the new arrays in order, or do I need some sort of labelling on the arrays? – whent1991 Jul 17 '15 at 21:09

emschorsch · Answer 3 · 2015-07-17T20:34:51.520

From looking at the documentation it seems like specifying the index of where to split on will work best. For your specific example the following works if arr is already a 2dimensional numpy array:

np.split(arr, np.where(arr[:,2] == 1)[0])

arr[:,2] returns a list of the 3rd entry in each tuple. The colon says to take every row and the 2 says to take the 3rd column, which is the 3rd component.

We then use np.where to return all the places where the 3rd coordinate is a 1. We have to do np.where()[0] to get at the array of locations directly.

We then plug in the indices we've found where the 3rd coordinate is 1 to np.split which splits at the desired locations.

Note that because the first entry has a 1 in the 3rd coordinate it will split before the first entry. This gives us one extra "split" array which is empty.

Split NumPy array according to values in the array (a condition)

3 Answers3

Linked