7

I have couple of lists:

a = [1,2,3]
b = [1,2,3,4,5,6]

which are of variable length.

I want to return a vector of length five, such that if the input list length is < 5 then it will be padded with zeros on the right, and if it is > 5, then it will be truncated at the 5th element.

For example, input a would return np.array([1,2,3,0,0]), and input b would return np.array([1,2,3,4,5]).

I feel like I ought to be able to use np.pad, but I can't seem to follow the documentation.

ali_m
  • 71,714
  • 23
  • 223
  • 298
frazman
  • 32,081
  • 75
  • 184
  • 269
  • To use np.pad, np.pad(x,((0,5-len(x))),mode='constant', constant_values=0) is array([1, 2, 3, 0, 0]) for x = np.array([1,2,3]), however it won't work for len(x) > 5 so in that case just slice the array, for example if y = np.array([1,2,3,4,5,6]) then y[:5] = array([1, 2, 3, 4, 5]) –  Aug 19 '15 at 23:01

3 Answers3

8

This might be slow or fast, I am not sure, however it works for your purpose.

In [22]: pad = lambda a,i : a[0:i] if len(a) > i else a + [0] * (i-len(a))

In [23]: pad([1,2,3], 5)
Out[23]: [1, 2, 3, 0, 0]

In [24]: pad([1,2,3,4,5,6,7], 5)
Out[24]: [1, 2, 3, 4, 5]
Sait
  • 19,045
  • 18
  • 72
  • 99
  • 2
    For a numpy version: `pad = lambda a, i: a[0: i] if a.shape[0] > i else np.hstack((a, np.zeros(i - a.shape[0])))` – Sid Aug 20 '15 at 04:55
3

np.pad is overkill, better for adding a border all around a 2d image than adding some zeros to a list.

I like the zip_longest, especially if the inputs are lists, and don't need to be arrays. It's probably the closest you'll find to a code that operates on all lists at once in compiled code).

a, b = zip(*list(itertools.izip_longest(a, b, fillvalue=0)))

is a version that does not use np.array at all (saving some array overhead)

But by itself it does not truncate. It stills something like [x[:5] for x in (a,b)].

Here's my variation on all_ms function, working with a simple list or 1d array:

def foo_1d(x, n=5):
    x = np.asarray(x)
    assert x.ndim==1
    s = np.min([x.shape[0], n])
    ret = np.zeros((n,), dtype=x.dtype)
    ret[:s] = x[:s]
    return ret

In [772]: [foo_1d(x) for x in [[1,2,3], [1,2,3,4,5], np.arange(10)[::-1]]]
Out[772]: [array([1, 2, 3, 0, 0]), array([1, 2, 3, 4, 5]), array([9, 8, 7, 6, 5])]

One way or other the numpy solutions do the same thing - construct a blank array of the desired shape, and then fill it with the relevant values from the original.

One other detail - when truncating the solution could, in theory, return a view instead of a copy. But that requires handling that case separately from a pad case.


If the desired output is a list of equal lenth arrays, it may be worth while collecting them in a 2d array.

In [792]: def foo1(x, out):
    x = np.asarray(x)
    s = np.min((x.shape[0], out.shape[0]))
    out[:s] = x[:s]

In [794]: lists = [[1,2,3], [1,2,3,4,5], np.arange(10)[::-1], []]

In [795]: ret=np.zeros((len(lists),5),int)
In [796]: for i,xx in enumerate(lists):
    foo1(xx, ret[i,:])
In [797]: ret
Out[797]: 
array([[1, 2, 3, 0, 0],
       [1, 2, 3, 4, 5],
       [9, 8, 7, 6, 5],
       [0, 0, 0, 0, 0]])
hpaulj
  • 221,503
  • 14
  • 230
  • 353
1

Pure python version, where a is a python list (not a numpy array): a[:n] + [0,]*(n-len(a)).

For example:

In [42]: n = 5

In [43]: a = [1, 2, 3]

In [44]: a[:n] + [0,]*(n - len(a))
Out[44]: [1, 2, 3, 0, 0]

In [45]: a = [1, 2, 3, 4]

In [46]: a[:n] + [0,]*(n - len(a))
Out[46]: [1, 2, 3, 4, 0]

In [47]: a = [1, 2, 3, 4, 5]

In [48]: a[:n] + [0,]*(n - len(a))
Out[48]: [1, 2, 3, 4, 5]

In [49]: a = [1, 2, 3, 4, 5, 6]

In [50]: a[:n] + [0,]*(n - len(a))
Out[50]: [1, 2, 3, 4, 5]

Function using numpy:

In [121]: def tosize(a, n):
   .....:     a = np.asarray(a)
   .....:     x = np.zeros(n, dtype=a.dtype)
   .....:     m = min(n, len(a))
   .....:     x[:m] = a[:m]
   .....:     return x
   .....: 

In [122]: tosize([1, 2, 3], 5)
Out[122]: array([1, 2, 3, 0, 0])

In [123]: tosize([1, 2, 3, 4], 5)
Out[123]: array([1, 2, 3, 4, 0])

In [124]: tosize([1, 2, 3, 4, 5], 5)
Out[124]: array([1, 2, 3, 4, 5])

In [125]: tosize([1, 2, 3, 4, 5, 6], 5)
Out[125]: array([1, 2, 3, 4, 5])
Warren Weckesser
  • 110,654
  • 19
  • 194
  • 214