0

I am using scipy in python for sparse arrays/matrices.

I have a sparse array [generated out of dot product between other two]. The size of this array should not be big (can be hundreds) but this piece of code is being called many many times. So it must be very efficient.

What i need is an efficient way to create a string of 1 and 0 as follows:

    nonzeros = np.nonzero(hashcode)
    arr = ['0']*hashcode_length
    if nonzeros != None:
        for i in nonzeros[1]:
            arr[i] = '1'
    concatenated = ''.join(arr)

as an example... if the sparse array is of length 10 and the values are:

    [0, 0, 1, 1, 0, 1, 0, 0, 0, 1]

The output should be:

    "0011010001"

How can this code be improved?

Note:

  • sorry for not putting all the code, since it is big and has too much not-relevant details
  • Let me know if the question is not clear or some more details are needed
Samer Aamar
  • 1,298
  • 1
  • 15
  • 23
  • Where is the array coming from? What are you doing with the final strings? The reason I ask is because performance critical portions of a python program can be written in c/c++: At the bottom of [this page](http://scipy.github.io/old-wiki/pages/PerformancePython) they show a comparison of languages. In personal experience, when we rewrote a graph solver from python to c, we experienced a several _orders of magnitude_ increase in speed. Python took about 90 seconds to solve one graph. In C, it solved all 1200 of our graphs in under a second, due to clever use of bit twiddling. – TemporalWolf Oct 25 '16 at 16:57
  • When are you expecting `nonzeros` to be `None`? – hpaulj Oct 25 '16 at 18:09
  • The arrays are out of calculation and multiplication - i am implementing some algorithm using python. You can simply assume that the array is given and that it has values either 1 or 0 (integer values) – Samer Aamar Oct 25 '16 at 19:43

1 Answers1

0

My first idea is this:

In [1380]: x=sparse.coo_matrix([0,0,1,1,0,1,0,0,0,1])
In [1381]: ''.join([str(i) for i in x.A.ravel().tolist()])
Out[1381]: '0011010001'

Not necessarily better, but I think it illustrates some key issues. Is it useful to work with the sparse matrix or dense? How do you convert integers to strings? Is this a 1 row matix?

I can improve the string conversion with astype:

In [1393]: ''.join(x.A.ravel().astype('U1'))
Out[1393]: '0011010001'

join is performing a list iteration on the array.

With bytestrings (PY3, or normal string in PY2), tostring is an alternative to join. This just returns the databuffer as a string:

In [1451]: x.A.astype('S1').tostring()
Out[1451]: b'0011010001'

I can use the astype on the sparse matrix, but there's a bug preventing me from making that dense:

In [1397]: x.astype('U1').A
...
ValueError: unsupported data types in input

===========================

A variation on your iteration; start with a 0 string; make it a list; sparse.nonzero just returns the .col values from the coo format matrix.

In [1403]: ll    # ll = '0'*x.shape[1]
Out[1403]: '0000000000'
In [1404]: ll=list(ll)
In [1405]: ll
Out[1405]: ['0', '0', '0', '0', '0', '0', '0', '0', '0', '0']
In [1406]: for i in x.col:
      ...:     ll[i]='1'
      ...:     
In [1407]: ll
Out[1407]: ['0', '0', '1', '1', '0', '1', '0', '0', '0', '1']
In [1408]: ''.join(ll)
Out[1408]: '0011010001'

Or doing the same thing with a string array:

In [1416]: ll=np.empty(x.shape[1], dtype='U1')
In [1417]: ll.fill('0')
In [1418]: ll
Out[1418]: 
array(['0', '0', '0', '0', '0', '0', '0', '0', '0', '0'], 
      dtype='<U1')
In [1419]: ll[x.col]='1'
In [1420]: ll
Out[1420]: 
array(['0', '0', '1', '1', '0', '1', '0', '0', '0', '1'], 
      dtype='<U1')

This avoids the loop(s) since I can assign multiple values at once.

For this small example, the list solutions might be just as fast or faster. Array versions have some array creation overhead, so they are best if the case is large.

Even for a coo matrix with (1,1210) shape, this list iterative version is noticeably faster:

def foo1(x):
    ll=list('0'*x.shape[1])   # ll=['0']*x.shape[1] is little faster
    for i in x.col:
        ll[i]='1'
    return ''.join(ll)

If the matrix is not coo, either convert it, x.tocoo() or use x.nonzero() (but look at its code).

=========

I've ignored your None test. Why is that there? It could be dangerous

In [1448]: x.nonzero()[1] != None
/usr/local/bin/ipython3:1: FutureWarning: comparison to `None` will result in an elementwise object comparison in the future.
  #!/usr/bin/python3
Out[1448]: True
hpaulj
  • 221,503
  • 14
  • 230
  • 353