0

I want to convert a list of tuples into a numpy array. For example:

items = [(1, 2), (3, 4)]

using np.asarray(items) I get:

array([[1, 2],
       [3, 4]])

but if I try to append the items individually:

new_array = np.empty(0)
for item in items:
    new_array = np.append(new_array, item)

the new_array loses the original shape and becomes:

array([1., 2., 3., 4.])

I can get it to the shape I wanted using new_array.reshape(2, 2):

array([[1., 2.],
       [3., 4.]])

but how would I get that shape without reshaping?

waspinator
  • 6,464
  • 11
  • 52
  • 78
  • 4
    np.asarray() does what you want, so why are you looking for inefficient looping methods? – John Zwinck Nov 11 '18 at 03:36
  • `np.append` has several booby traps. Did you read its docs? Better yet read its code. What's the original shape? `new_array` starts with a (0,) shape. `items` isn't an array so doesn't have a shape. – hpaulj Nov 11 '18 at 04:08
  • Is the fact that it's a list of tuples instead of lists significant. – hpaulj Nov 11 '18 at 04:16

2 Answers2

1

Firstly you need to provide a correct shape to the array so that numpy could understand how to interpret the values provided to the append method.

Then, to prevent automatic flattening, specify the axis you wish to append on.

This code does what you intended to do:

import numpy as np

items = [(1,2),(3,4)]

new_array = np.ndarray((0,2))
for item in items:
    new_array = np.append(new_array, [item], axis=0)

print(new_array) # [[1. 2.]
                 #  [3. 4.]]
  • Indeed this is the right way to use ` np.append`, if you must. Not that we encourage the iterative use of any concatenate family; it's too inefficient. – hpaulj Nov 11 '18 at 04:52
  • @hpaulj Say that a user can't preallocate and *has* to dynamically grow an array over time (due to streaming data or the like). What's your own personal feeling as to the "best" pattern by which to carry out said dynamic array growth in the current version of numpy? – tel Nov 11 '18 at 05:41
  • 1
    Collecting values in a list and performing one array build at the end is a time honored method. But the incremental concatenate might be better if you need an array, rather than a list, at intermediate stages (for stats or some other calculation). Other things being equal it comes down to timings - what's faster, or at least fast enough in a realistic case. – hpaulj Nov 11 '18 at 07:11
1

If you have a list of tuples, and you've decided you hate the standard array constructors (np.array, np.asarray, etc, which, as @JohnZwinck pointed out are probably the best answer) for some reason, the most efficient approach would be to preallocate the entire array and then assign to it:

items = [(1, 2), (3, 4)]
arr = np.empty((len(items), len(items[0])))

arr[...] = items

Even if what you want is to grow an array over time, row-by-row, it has been shown through detailed timings that you're usually better off just allocating a whole new array and then copying over the old values.

So given the above arr, by this approach the most efficient way to append a row would be:

newitem = (5, 6)
oldarr = arr
arr = np.empty((oldarr.shape[0] + 1, *oldarr.shape[1:]))

arr[:-1,:] = oldarr
arr[-1,:] = newitem
tel
  • 13,005
  • 2
  • 44
  • 62