1

There is list of list of tuples:

[[(0, 0.5), (1, 0.6)], [(4, 0.01), (5, 0.005), (6, 0.002)], [(1,0.7)]]

I need to get matrix X x Y:

x = num of sublists
y = max among second eleme throught all pairs
elem[x,y] = second elem for x sublist if first elem==Y 
0 1 2 3 4 5 6
0.5 0.6 0 0 0 0 0
0 0 0 0 0.01 0.005 0.002
0 0.7 0 0 0 0 0
Jérôme Richard
  • 41,678
  • 6
  • 29
  • 59
JulJ
  • 107
  • 1
  • 1
  • 8

2 Answers2

2

You can figure out the array's dimensions the following way. The Y dimension is the number of sublists

>>> data = [[(0, 0.5), (1, 0.6)], [(4, 0.01), (5, 0.005), (6, 0.002)], [(1,0.7)]]
>>> dim_y = len(data)
>>> dim_y
3

The X dimension is the largest [0] index of all of the tuples, plus 1.

>>> dim_x = max(max(i for i,j in sub) for sub in data) + 1
>>> dim_x
7

So then initialize an array of all zeros with this size

>>> import numpy as np
>>> arr = np.zeros((dim_x, dim_y))
>>> arr
array([[0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.]])

Now to fill it, enumerate over your sublists to keep track of the y index. Then for each sublist use the [0] for the x index and the [1] for the value itself

for y, sub in enumerate(data):
    for x, value in sub:
        arr[x,y] = value

Then the resulting array should be populated (might want to transpose to look like your desired dimensions).

>>> arr.T
array([[0.5  , 0.6  , 0.   , 0.   , 0.   , 0.   , 0.   ],
       [0.   , 0.   , 0.   , 0.   , 0.01 , 0.005, 0.002],
       [0.   , 0.7  , 0.   , 0.   , 0.   , 0.   , 0.   ]])
Cory Kramer
  • 114,268
  • 16
  • 167
  • 218
  • Cory, thanks for reply! I hoped there's some numpy's magic to avoid loops )) It seems there isn't ) – JulJ Jun 11 '21 at 11:54
  • If I had more coffee and thought for a while there might be.... this is just the naive way that came to mind. If I think of something more clever in pure numpy I'll update my answer. – Cory Kramer Jun 11 '21 at 11:55
  • Avoiding loops would require turning `data` into an array. That is a time consuming step. But `data` is a mix of integer (indices) and float (values), But more significantly, you have different number of tuples in each row. – hpaulj Jun 11 '21 at 15:55
2

As I commented in the accepted answer, data is 'ragged' and can't be made into a array.

Now if the data had a more regular form, a no-loop solution is possible. But conversion to such a form requires the same double looping!

In [814]: [(i,j,v) for i,row in enumerate(data) for j,v in row]
Out[814]: 
[(0, 0, 0.5),
 (0, 1, 0.6),
 (1, 4, 0.01),
 (1, 5, 0.005),
 (1, 6, 0.002),
 (2, 1, 0.7)]

'transpose' and separate into 3 variables:

In [815]: I,J,V=zip(*_)
In [816]: I,J,V
Out[816]: ((0, 0, 1, 1, 1, 2), (0, 1, 4, 5, 6, 1), (0.5, 0.6, 0.01, 0.005, 0.002, 0.7))

I stuck with the list transpose here so as to not convert the integer indices to floats. It may also be faster, since making an array from a list isn't a time-trivial task.

Now we can assign values via numpy magic:

In [819]: arr = np.zeros((3,7))
In [820]: arr[I,J]=V
In [821]: arr
Out[821]: 
array([[0.5  , 0.6  , 0.   , 0.   , 0.   , 0.   , 0.   ],
       [0.   , 0.   , 0.   , 0.   , 0.01 , 0.005, 0.002],
       [0.   , 0.7  , 0.   , 0.   , 0.   , 0.   , 0.   ]])

I,J,V could also be used as input to a scipy.sparse.coo_matrix call, making a sparse matrix.

Speaking of a sparse matrix, here's what a sparse version of arr looks like:

In list-of-lists format:

In [822]: from scipy import sparse
In [823]: M = sparse.lil_matrix(arr)
In [824]: M
Out[824]: 
<3x7 sparse matrix of type '<class 'numpy.float64'>'
    with 6 stored elements in List of Lists format>
In [825]: M.A
Out[825]: 
array([[0.5  , 0.6  , 0.   , 0.   , 0.   , 0.   , 0.   ],
       [0.   , 0.   , 0.   , 0.   , 0.01 , 0.005, 0.002],
       [0.   , 0.7  , 0.   , 0.   , 0.   , 0.   , 0.   ]])
In [826]: M.rows
Out[826]: array([list([0, 1]), list([4, 5, 6]), list([1])], dtype=object)
In [827]: M.data
Out[827]: 
array([list([0.5, 0.6]), list([0.01, 0.005, 0.002]), list([0.7])],
      dtype=object)

and the more common coo format:

In [828]: Mc=M.tocoo()
In [829]: Mc.row
Out[829]: array([0, 0, 1, 1, 1, 2], dtype=int32)
In [830]: Mc.col
Out[830]: array([0, 1, 4, 5, 6, 1], dtype=int32)
In [831]: Mc.data
Out[831]: array([0.5  , 0.6  , 0.01 , 0.005, 0.002, 0.7  ])

and the csr used for most calculations:

In [832]: Mr=M.tocsr()
In [833]: Mr.data
Out[833]: array([0.5  , 0.6  , 0.01 , 0.005, 0.002, 0.7  ])
In [834]: Mr.indices
Out[834]: array([0, 1, 4, 5, 6, 1], dtype=int32)
In [835]: Mr.indptr
Out[835]: array([0, 2, 5, 6], dtype=int32)
hpaulj
  • 221,503
  • 14
  • 230
  • 353