1

I'm trying to sort a numpy array by a specific column (in-place) using the solution from this answer. For the most part it works, but it fails on any array that's a view on another array:

In [35]: columnnum = 2

In [36]: a = np.array([[1,2,3], [4,7,5], [9,0,1]])

In [37]: a
Out[37]: 
array([[1, 2, 3],
       [4, 7, 5],
       [9, 0, 1]])

In [38]: b = a[:,(0, 2)]

In [39]: b
Out[39]: 
array([[1, 3],
       [4, 5],
       [9, 1]])

In [40]: a.view(','.join([a.dtype.str] * a.shape[1])).sort(order=['f%d' % columnnum], axis=0)

In [41]: a
Out[41]: 
array([[9, 0, 1],
       [1, 2, 3],
       [4, 7, 5]])

In [42]: b.view(','.join([b.dtype.str] * b.shape[1])).sort(order=['f%d' % columnnum], axis=0)
ValueError: new type not compatible with array.

It looks like numpy doesn't support views of views, which makes a certain amount of sense, but I now can't figure out how to get the view I need for any array, whether it itself is a view or not. So far, I haven't been able to find any way to get the necessary information about the view I have to construct the new one I need.

For now, I'm using the l = l[l[:,columnnum].argsort()] in-place sorting method, which works fine, but since I'm operating on large datasets, I'd like to avoid the extra memory overhead of the argsort() call (the list of indexes). Is there either a way to get the necessary information about the view or to do the sort by column?

Community
  • 1
  • 1
Linuxios
  • 34,849
  • 13
  • 91
  • 116
  • My hunch is you're going to be out of luck. I don't know that much about the numpy internals, but it looks to me like you are trying to push the view beyond what can be done without copying the data. – BrenBarn Nov 29 '16 at 06:53
  • @BrenBarn: Probably. I just wanted to see if anyone here had any clever hacks. My current solution is just to error on data that can't be viewed and tell the caller to make a copy first... Ugly, but at least I can provide a useful error message. – Linuxios Nov 29 '16 at 16:37

1 Answers1

1
In [1019]: a=np.array([[1,2,3],[4,7,5],[9,0,1]])
In [1020]: b=a[:,(0,2)]

This is the a that you are sorting; a structured array with 3 fields. It uses the same data buffer, but interpreting groups of 3 ints as fields rather than columns.

In [1021]: a.view('i,i,i')
Out[1021]: 
array([[(1, 2, 3)],
       [(4, 7, 5)],
       [(9, 0, 1)]], 
      dtype=[('f0', '<i4'), ('f1', '<i4'), ('f2', '<i4')])

By the same logic, you try to view b:

In [1022]: b.view('i,i')
/usr/local/bin/ipython3:1: DeprecationWarning: Changing the shape of non-C contiguous array by
descriptor assignment is deprecated. To maintain
the Fortran contiguity of a multidimensional Fortran
array, use 'a.T.view(...).T' instead
  #!/usr/bin/python3
....
ValueError: new type not compatible with array.

But if I use 3 fields instead of 2, it works (but with the same warning):

In [1023]: b.view('i,i,i')
/usr/local/bin/ipython3:1: DeprecationWarning:...
Out[1023]: 
array([[(1, 4, 9), (3, 5, 1)]], 
      dtype=[('f0', '<i4'), ('f1', '<i4'), ('f2', '<i4')])

The problem is that b is Fortran order. Check b.flags or

In [1026]: a.strides
Out[1026]: (12, 4)
In [1027]: b.strides
Out[1027]: (4, 12)

b is a copy, not a view. I don't know, off hand, why this construction of b changed the order.

Heeding the warning, I can do:

In [1047]: b.T.view('i,i,i').T
Out[1047]: 
array([[(1, 4, 9), (3, 5, 1)]], 
      dtype=[('f0', '<i4'), ('f1', '<i4'), ('f2', '<i4')])

A default copy (order c) of b can be viewed as 2 fields:

In [1042]: b1=b.copy()
In [1043]: b1.strides
Out[1043]: (8, 4)
In [1044]: b1.view('i,i')
Out[1044]: 
array([[(1, 3)],
       [(4, 5)],
       [(9, 1)]], 
      dtype=[('f0', '<i4'), ('f1', '<i4')])

A footnote on: https://docs.scipy.org/doc/numpy/reference/arrays.indexing.html

The memory layout of an advanced indexing result is optimized for each indexing operation and no particular memory order can be assumed.

====================

b in this case was constructed with advanced indexing, and thus is a copy, even a true view might not be viewable in this way either:

In [1052]: a[:,:2].view('i,i')
....
ValueError: new type not compatible with array.

In [1054]: a[:,:2].copy().view('i,i')
Out[1054]: 
array([[(1, 2)],
       [(4, 7)],
       [(9, 0)]], 
      dtype=[('f0', '<i4'), ('f1', '<i4')])

The view is selecting a subset of the values: 'i,i,x,i,i,x,i,i,x...', and that does not translated into structured dtype.

The structured view of a does: '(i,i,i),(i,i,i),...'

You can select a subset of the fields of a structured array:

In [1059]: a1=a.view('i,i,i')
In [1060]: a1
Out[1060]: 
array([[(1, 2, 3)],
       [(4, 7, 5)],
       [(9, 0, 1)]], 
      dtype=[('f0', '<i4'), ('f1', '<i4'), ('f2', '<i4')])
In [1061]: b1=a1[['f0','f2']]
In [1062]: b1
Out[1062]: 
array([[(1, 3)],
       [(4, 5)],
       [(9, 1)]], 
      dtype=[('f0', '<i4'), ('f2', '<i4')])

But there are limits as to what you can do with such a view. Values can be changed in a1, and seen in a and b1. But I get an error if I try to change values in b1. This is on the development edge.

hpaulj
  • 221,503
  • 14
  • 230
  • 353