0

I have an array:

    arr = [(1,1,1), (1,1,2), (1,1,3), (1,1,4)...(35,1,22),(35,1,23)]

I want to split my array according to the third value in each ordered pair. I want each third value of 1 to be the start of a new array. The results should be:

    [(1,1,1), (1,1,2),...(1,1,35)][(1,2,1), (1,2,2),...(1,2,46)]

and so on. I know numpy.split should do the trick but I'm lost as to how to write the condition for the split.

whent1991
  • 257
  • 1
  • 4
  • 9

3 Answers3

3

Here's a quick idea, working with a 1d array. It can be easily extended to work with your 2d array:

In [385]: x=np.arange(10)

In [386]: I=np.where(x%3==0)

In [387]: I
Out[387]: (array([0, 3, 6, 9]),)

In [389]: np.split(x,I[0])
Out[389]: 
[array([], dtype=float64),
 array([0, 1, 2]),
 array([3, 4, 5]),
 array([6, 7, 8]),
 array([9])]

The key is to use where to find the indecies where you want split to act.


For a 2d arr

First make a sample 2d array, with something interesting in the 3rd column:

In [390]: arr=np.ones((10,3))
In [391]: arr[:,2]=np.arange(10)
In [392]: arr
Out[392]: 
array([[ 1.,  1.,  0.],
       [ 1.,  1.,  1.],
       ...
       [ 1.,  1.,  9.]])

Then use the same where and boolean to find indexes to split on:

In [393]: I=np.where(arr[:,2]%3==0)

In [395]: np.split(arr,I[0])
Out[395]: 
[array([], dtype=float64),
 array([[ 1.,  1.,  0.],
       [ 1.,  1.,  1.],
       [ 1.,  1.,  2.]]),
 array([[ 1.,  1.,  3.],
       [ 1.,  1.,  4.],
       [ 1.,  1.,  5.]]),
 array([[ 1.,  1.,  6.],
       [ 1.,  1.,  7.],
       [ 1.,  1.,  8.]]),
 array([[ 1.,  1.,  9.]])]
hpaulj
  • 221,503
  • 14
  • 230
  • 353
  • This probably is a dumb question, but could you explain the meaning of the In [390],...Out[392]? – whent1991 Jul 17 '15 at 20:30
  • I'm just constructing a sample 2d array with an interesting 3rd column. You didn't give us a real sample! – hpaulj Jul 17 '15 at 20:45
  • im not sure if this is what you are asking about, but @hpaulj is using an ipython notebook, which lets you enter a line of code and run it immediately. The stuff you type in is In[number]. If a function returns something and you do not assign to a variable, ipython prints it out. – rohanp Jul 17 '15 at 21:17
  • This should be the accepted answer. Using an external for-loop is not really the way to go with numpy. – Dux Jul 18 '15 at 16:52
0

I cannot think of any numpy functions or tricks to do this . A simple solution using for loop would be -

In [48]: arr = [(1,1,1), (1,1,2), (1,1,3), (1,1,4),(1,2,1),(1,2,2),(1,2,3),(1,3,1),(1,3,2),(1,3,3),(1,3,4),(1,3,5)]

In [49]: result = []

In [50]: for i in arr:
   ....:     if i[2] == 1:
   ....:         tempres = []
   ....:         result.append(tempres)
   ....:     tempres.append(i)
   ....:

In [51]: result
Out[51]:
[[(1, 1, 1), (1, 1, 2), (1, 1, 3), (1, 1, 4)],
 [(1, 2, 1), (1, 2, 2), (1, 2, 3)],
 [(1, 3, 1), (1, 3, 2), (1, 3, 3), (1, 3, 4), (1, 3, 5)]]
Anand S Kumar
  • 88,551
  • 18
  • 188
  • 176
  • Apparently line 1 has invalid syntax and is unable to run, according to Python – whent1991 Jul 17 '15 at 20:40
  • do not copy the whole `In [48]:` things, those are my ipython prompts, just copy the portion from `arr = `. – Anand S Kumar Jul 17 '15 at 20:45
  • I want to run a code on each of the new arrays - does Python automatically run the code through the new arrays in order, or do I need some sort of labelling on the arrays? – whent1991 Jul 17 '15 at 21:09
0

From looking at the documentation it seems like specifying the index of where to split on will work best. For your specific example the following works if arr is already a 2dimensional numpy array:

np.split(arr, np.where(arr[:,2] == 1)[0])

arr[:,2] returns a list of the 3rd entry in each tuple. The colon says to take every row and the 2 says to take the 3rd column, which is the 3rd component.

We then use np.where to return all the places where the 3rd coordinate is a 1. We have to do np.where()[0] to get at the array of locations directly.

We then plug in the indices we've found where the 3rd coordinate is 1 to np.split which splits at the desired locations.

Note that because the first entry has a 1 in the 3rd coordinate it will split before the first entry. This gives us one extra "split" array which is empty.

emschorsch
  • 1,619
  • 3
  • 19
  • 33