In numpy
reshape
means to change the shape
in such a way that keeps the same number elements. So the product of the shape terms can't change.
The simplest example is something like
np.arange(12).reshape(3,4)
The assignment method is:
x = np.arange(12)
x.shape = (3,4)
The method
(or np.reshape(...)
) returns a new array. The shape
assignment works in-place.
The docs note that you quote comes into play when doing something like
x = np.arange(12).reshape(3,4).T
x.reshape(3,4) # ok, but copy
x.shape = (3,4) # raises error
To better understand what's happening here, print the array at different stages, and look at how the original 0,1,2,...
contiguity changes. (that's left as an exercise for the reader since it isn't central to the bigger question.)
There is a resize
function and method, but it isn't used much, and its behavior with respect to views and copies is tricky.
np.concatenate
(and variants like np.stack
, np.vstack
) make new arrays, and copy all the data from the inputs.
A list (and object dtype array) contains pointers to the elements (which may be arrays), and so don't require copying data.
Sparse matrices store their data (and row/col indices) in various attributes that differ among the formats. coo
, csr
and csc
have 3 1d arrays. lil
has 2 object arrays containing lists. dok
is a dictionary subclass.
lil_matrix
implements a reshape
method. The other formats do not. As with np.reshape
the product of the dimensions can't change.
In theory a sparse matrix could be 'embedded' in a larger matrix with minimal copying of data, since all the new values will be the default 0, and not occupy any space. But the details for that operation have not been worked out for any of the formats.
sparse.hstack
and sparse.vstack
(don't use the numpy
versions on sparse matrices) work by combining the coo
attributes of the inputs (via sparse.bmat
). So yes, they make new arrays (data
, row
, col
).
A minimal example of making a larger sparse matrix:
In [110]: M = sparse.random(5,5,.2,'coo')
In [111]: M
Out[111]:
<5x5 sparse matrix of type '<class 'numpy.float64'>'
with 5 stored elements in COOrdinate format>
In [112]: M.A
Out[112]:
array([[0. , 0.80957797, 0. , 0. , 0. ],
[0. , 0. , 0. , 0. , 0. ],
[0. , 0.23618044, 0. , 0.91625967, 0.8791744 ],
[0. , 0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0.7928235 , 0. ]])
In [113]: M1 = sparse.coo_matrix((M.data, (M.row, M.col)),shape=(7,5))
In [114]: M1
Out[114]:
<7x5 sparse matrix of type '<class 'numpy.float64'>'
with 5 stored elements in COOrdinate format>
In [115]: M1.A
Out[115]:
array([[0. , 0.80957797, 0. , 0. , 0. ],
[0. , 0. , 0. , 0. , 0. ],
[0. , 0.23618044, 0. , 0.91625967, 0.8791744 ],
[0. , 0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0.7928235 , 0. ],
[0. , 0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0. , 0. ]])
In [116]: id(M1.data)
Out[116]: 139883362735488
In [117]: id(M.data)
Out[117]: 139883362735488
M
and M1
have the same data
attribute (same array id). But most operations on these matrices will require a conversion to another format (such as csr
for math, or lil
for changing values), and will involve copying and modifying the attributes. So this connection between the two matrices will be broken.
When you make a sparse matrix with a function like coo_matrix
, and don't provide a shape
parameter, it deduces the shape from the provided coordinates. If you provide a shape
it uses that. That shape has to be at least as large as the implied shape. With lil
(and dok
) you can profitably create an 'empty' matrix with a large shape, and then set values iteratively. You don't want to do that with csr
. And you can't directly set coo
values.
The canonical way of creating sparse matrices is to build the data
, row
, and col
arrays or lists iteratively from various pieces - with list append/extend or array concatenates, and make a coo
(or csr
) format array from that. So you do all the 'growing' before even creating the matrix.
changing _shape
Make a matrix:
In [140]: M = (sparse.random(5,3,.4,'csr')*10).astype(int)
In [141]: M
Out[141]:
<5x3 sparse matrix of type '<class 'numpy.int64'>'
with 6 stored elements in Compressed Sparse Row format>
In [142]: M.A
Out[142]:
array([[0, 6, 7],
[0, 0, 6],
[1, 0, 5],
[0, 0, 0],
[0, 6, 0]])
In [144]: M[1,0] = 10
... SparseEfficiencyWarning)
In [145]: M.A
Out[145]:
array([[ 0, 6, 7],
[10, 0, 6],
[ 1, 0, 5],
[ 0, 0, 0],
[ 0, 6, 0]])
your new shape method (make sure the dtype
of indptr
doesn't change):
In [146]: M._shape = (6,3)
In [147]: newptr = np.hstack((M.indptr,M.indptr[-1]))
In [148]: newptr
Out[148]: array([0, 2, 4, 6, 6, 7, 7], dtype=int32)
In [149]: M.indptr = newptr
In [150]: M
Out[150]:
<6x3 sparse matrix of type '<class 'numpy.int64'>'
with 7 stored elements in Compressed Sparse Row format>
In [151]: M.A
Out[151]:
array([[ 0, 6, 7],
[10, 0, 6],
[ 1, 0, 5],
[ 0, 0, 0],
[ 0, 6, 0],
[ 0, 0, 0]])
In [152]: M[5,2]=10
... SparseEfficiencyWarning)
In [153]: M.A
Out[153]:
array([[ 0, 6, 7],
[10, 0, 6],
[ 1, 0, 5],
[ 0, 0, 0],
[ 0, 6, 0],
[ 0, 0, 10]])
Adding a column also seems to work:
In [154]: M._shape = (6,4)
In [155]: M
Out[155]:
<6x4 sparse matrix of type '<class 'numpy.int64'>'
with 8 stored elements in Compressed Sparse Row format>
In [156]: M.A
Out[156]:
array([[ 0, 6, 7, 0],
[10, 0, 6, 0],
[ 1, 0, 5, 0],
[ 0, 0, 0, 0],
[ 0, 6, 0, 0],
[ 0, 0, 10, 0]])
In [157]: M[5,0]=10
.... SparseEfficiencyWarning)
In [158]: M[5,3]=10
.... SparseEfficiencyWarning)
In [159]: M
Out[159]:
<6x4 sparse matrix of type '<class 'numpy.int64'>'
with 10 stored elements in Compressed Sparse Row format>
In [160]: M.A
Out[160]:
array([[ 0, 6, 7, 0],
[10, 0, 6, 0],
[ 1, 0, 5, 0],
[ 0, 0, 0, 0],
[ 0, 6, 0, 0],
[10, 0, 10, 10]])
attribute sharing
I can make a new matrix from an existing one:
In [108]: M = (sparse.random(5,3,.4,'csr')*10).astype(int)
In [109]: newptr = np.hstack((M.indptr,6))
In [110]: M1 = sparse.csr_matrix((M.data, M.indices, newptr), shape=(6,3))
The data
attributes a shared, at least in view sense:
In [113]: M[0,1]=14
In [114]: M1[0,1]
Out[114]: 14
But if I modify M1
by adding a nonzero value:
In [117]: M1[5,0]=10
...
SparseEfficiencyWarning)
The link between the matrices breaks:
In [120]: M[0,1]=3
In [121]: M1[0,1]
Out[121]: 14