10

I need to create a 2D array where each row may start and end with a different number. Assume that first and last element of each row is given and all other elements are just interpolated according to length of the rows In a simple case let's say I want to create a 3X3 array with same start at 0 but different end given by W below:

array([[ 0.,  1.,  2.],
       [ 0.,  2.,  4.],
       [ 0.,  3.,  6.]])

Is there a better way to do this than the following:

D=np.ones((3,3))*np.arange(0,3)
D=D/D[:,-1] 
W=np.array([2,4,6]) # last element of each row assumed given
Res= (D.T*W).T  
Divakar
  • 218,885
  • 19
  • 262
  • 358
dayum
  • 1,073
  • 15
  • 31
  • 1
    If you want to use pandas: `pd.Series(W).apply(lambda e: np.linspace(0, e, 3))` – Zeugma Nov 16 '16 at 05:28
  • Basically you have two vectors (first and last columns of your matrix), correct? And you would then like to interpolate some values for each row. – Kartik Nov 16 '16 at 05:31
  • 1
    @dayum if you want to change the start positions, it's the same approach but you build a df with two vectors start and stop in it, and you call apply again with the lambda argument being df.start, df.end, 3 – Zeugma Nov 16 '16 at 05:48

4 Answers4

15

Here's an approach using broadcasting -

def create_ranges(start, stop, N, endpoint=True):
    if endpoint==1:
        divisor = N-1
    else:
        divisor = N
    steps = (1.0/divisor) * (stop - start)
    return steps[:,None]*np.arange(N) + start[:,None]

Sample run -

In [22]: # Setup start, stop for each row and no. of elems in each row
    ...: start = np.array([1,4,2])
    ...: stop  = np.array([6,7,6])
    ...: N = 5
    ...: 

In [23]: create_ranges(start, stop, 5)
Out[23]: 
array([[ 1.  ,  2.25,  3.5 ,  4.75,  6.  ],
       [ 4.  ,  4.75,  5.5 ,  6.25,  7.  ],
       [ 2.  ,  3.  ,  4.  ,  5.  ,  6.  ]])

In [24]: create_ranges(start, stop, 5, endpoint=False)
Out[24]: 
array([[ 1. ,  2. ,  3. ,  4. ,  5. ],
       [ 4. ,  4.6,  5.2,  5.8,  6.4],
       [ 2. ,  2.8,  3.6,  4.4,  5.2]])

Let's leverage multi-core!

We can leverage multi-core with numexpr module for large data and to gain memory efficiency and hence performance -

import numexpr as ne

def create_ranges_numexpr(start, stop, N, endpoint=True):
    if endpoint==1:
        divisor = N-1
    else:
        divisor = N
    s0 = start[:,None]
    s1 = stop[:,None]
    r = np.arange(N)
    return ne.evaluate('((1.0/divisor) * (s1 - s0))*r + s0')
Divakar
  • 218,885
  • 19
  • 262
  • 358
  • Why not leveraging `linspace` ? – Zeugma Nov 16 '16 at 05:29
  • 1
    @Divakar I meant something like [this](http://stackoverflow.com/a/16887295/624829) – Zeugma Nov 16 '16 at 05:33
  • Not searching for a loop here, I'm not that fluent in numpy, too much pandas biased, but I'm wondering if there is a way to broadcast the function in numpy somehow? – Zeugma Nov 16 '16 at 05:36
  • 2
    @Boud Well there is `np.apply_along_axis`, but that's not meant for performance. Broadcasting elements is what actually gives performance when it comes to NumPy arrays. – Divakar Nov 16 '16 at 05:42
  • `@Boud`, Saullo's iteration on `linspace` is a nice clear one, and probably one I'd use on the spur of the moment. But I wouldn't try make it any fancier than necessary. For a large array, Divakar's answer is only 2x faster than the OP. – hpaulj Nov 16 '16 at 07:06
  • Thanks @Boud, found [this solution](https://stackoverflow.com/a/16887425/3703716) due to your link. – mab Feb 11 '19 at 15:59
7

NumPy >= 1.16.0:

It is now possible to supply array-like values to start and stop parameters of the np.linspace.

For the example given in the question the syntax would be:

>>> np.linspace((0, 0, 0), (2, 4, 6), 3, axis=1)
array([[0., 1., 2.],
       [0., 2., 4.],
       [0., 3., 6.]])

New axis parameter specifies in which direction data will be generated. By default it is 0:

>>> np.linspace((0, 0, 0), (2, 4, 6), 3)
array([[0., 0., 0.],
       [1., 2., 3.],
       [2., 4., 6.]])
Community
  • 1
  • 1
Georgy
  • 12,464
  • 7
  • 65
  • 73
1

Like the OP's this use of linspace assumes the start is 0 for all rows.

x=np.linspace(0,1,N)[:,None]*np.arange(0,2*N,2)

(edit - this is the transpose of what I should get; either transpose it or switch the use of [:,None])

For N=3000, it's noticeably faster than @Divaker's solution. I'm not entirely sure why.

In [132]: timeit N=3000;x=np.linspace(0,1,N)[:,None]*np.arange(0,2*N,2)
10 loops, best of 3: 91.7 ms per loop
In [133]: timeit create_ranges(np.zeros(N),np.arange(0,2*N,2),N)
1 loop, best of 3: 197 ms per loop
In [134]: def foo(N):
     ...:     D=np.ones((N,N))*np.arange(N)
     ...:     D=D/D[:,-1]
     ...:     W=np.arange(0,2*N,2)
     ...:     return (D.T*W).T
     ...: 
In [135]: timeit foo(3000)
1 loop, best of 3: 454 ms per loop

============

With starts and stops I could use:

In [201]: starts=np.array([1,4,2]); stops=np.array([6,7,8])
In [202]: x=(np.linspace(0,1,5)[:,None]*(stops-starts)+starts).T
In [203]: x
Out[203]: 
array([[ 1.  ,  2.25,  3.5 ,  4.75,  6.  ],
       [ 4.  ,  4.75,  5.5 ,  6.25,  7.  ],
       [ 2.  ,  3.5 ,  5.  ,  6.5 ,  8.  ]])

With the extra calculations that makes it a bit slower than create_ranges.

In [208]: timeit N=3000;starts=np.zeros(N);stops=np.arange(0,2*N,2);x=(np.linspace(0,1,N)[:,None]*(stops-starts)+starts).T
1 loop, best of 3: 227 ms per loop

All these solutions are just variations the idea of doing a linear interpolation between the starts and stops.

hpaulj
  • 221,503
  • 14
  • 230
  • 353
  • Since the question states `"first and last element of each row is given"`, how would you incorporate the start and stop values for each row into `linspace` based solution? – Divakar Nov 16 '16 at 08:41
0

I extended a bit of the functionality based on @Divakar's solutions. It sacrifices some speed but now is compatible for different lengths of N instead of only scalar. Plus, this version it faster than @Saullo's sollution.

def create_ranges_divak(starts, stops, N, endpoint=True):
    if endpoint==1:
        divisor = N-1
    else:
        divisor = N
    steps = (1.0/divisor) * (stops - starts)
    uni_N = np.unique(N)
    if len(uni_N) == 1:
        return steps[:,None]*np.arange(uni_N) + starts[:,None]
    else:
        return [step * np.arange(n) + start for start, step, n in zip(starts, steps, N)]
Gabriel
  • 161
  • 2
  • 11