5

Given an array:

arr = np.array([[1, 3, 7], [4, 9, 8]]); arr

array([[1, 3, 7],
       [4, 9, 8]])

And given its indices:

np.indices(arr.shape)

array([[[0, 0, 0],
        [1, 1, 1]],

       [[0, 1, 2],
        [0, 1, 2]]])

How would I be able to stack them neatly one against the other to form a new 2D array? This is what I'd like:

array([[0, 0, 1],
       [0, 1, 3],
       [0, 2, 7],
       [1, 0, 4],
       [1, 1, 9],
       [1, 2, 8]])

This is my current solution:

def foo(arr):
    return np.hstack((np.indices(arr.shape).reshape(2, arr.size).T, arr.reshape(-1, 1)))

It works, but is there something shorter/more elegant to carry this operation out?

cs95
  • 379,657
  • 97
  • 704
  • 746
  • What happens if the array is a different data type to np.intp? What type should the output be? – Eric Aug 25 '17 at 12:24
  • @Eric Ah, I see what you mean. If the array is a float, I think it is okay to cast the indices to float. – cs95 Aug 25 '17 at 12:25

2 Answers2

4

Using array-initialization and then broadcasted-assignment for assigning indices and the array values in subsequent steps -

def indices_merged_arr(arr):
    m,n = arr.shape
    I,J = np.ogrid[:m,:n]
    out = np.empty((m,n,3), dtype=arr.dtype)
    out[...,0] = I
    out[...,1] = J
    out[...,2] = arr
    out.shape = (-1,3)
    return out

Note that we are avoiding the use of np.indices(arr.shape), which could have slowed things down.

Sample run -

In [10]: arr = np.array([[1, 3, 7], [4, 9, 8]])

In [11]: indices_merged_arr(arr)
Out[11]: 
array([[0, 0, 1],
       [0, 1, 3],
       [0, 2, 7],
       [1, 0, 4],
       [1, 1, 9],
       [1, 2, 8]])

Performance

arr = np.random.randn(100000, 2)

%timeit df = pd.DataFrame(np.hstack((np.indices(arr.shape).reshape(2, arr.size).T,\
                                arr.reshape(-1, 1))), columns=['x', 'y', 'value'])
100 loops, best of 3: 4.97 ms per loop

%timeit pd.DataFrame(indices_merged_arr_divakar(arr), columns=['x', 'y', 'value'])
100 loops, best of 3: 3.82 ms per loop

%timeit pd.DataFrame(indices_merged_arr_eric(arr), columns=['x', 'y', 'value'], dtype=np.float32)
100 loops, best of 3: 5.59 ms per loop

Note: Timings include conversion to pandas dataframe, that is the eventual use case for this solution.

cs95
  • 379,657
  • 97
  • 704
  • 746
Divakar
  • 218,885
  • 19
  • 262
  • 358
  • Okay, this looks simple. Would you consider adding some timings for larger 2D arrays, just for completeness? – cs95 Aug 24 '17 at 09:23
  • @cᴏʟᴅsᴘᴇᴇᴅ Do you have a loopy solution that I could compare against? – Divakar Aug 24 '17 at 09:24
  • I've edited the solution I have in my question as a function, if that helps. This is the only solution I have. – cs95 Aug 24 '17 at 09:25
  • @cᴏʟᴅsᴘᴇᴇᴅ Doesn't seem like any better. I guess a better one could use [`this one`](https://stackoverflow.com/a/11146645/). – Divakar Aug 24 '17 at 12:36
  • 1
    Added some perf stats. Your solution is great! – cs95 Aug 25 '17 at 10:33
3

A more generic answer for nd arrays, that handles other dtypes correctly:

def indices_merged_arr(arr):
    out = np.empty(arr.shape, dtype=[
        ('index', np.intp, arr.ndim),
        ('value', arr.dtype)
    ])
    out['value'] = arr
    for i, l in enumerate(arr.shape):
        shape = (1,)*i + (-1,) + (1,)*(arr.ndim-1-i)
        out['index'][..., i] = np.arange(l).reshape(shape)
    return out.ravel()

This returns a structured array with an index column and a value column, which can be of different types.

Eric
  • 95,302
  • 53
  • 242
  • 374