non-NDFFrame object error using pandas.SparseSeries.from_coo() function

Question

I am trying to convert a COO type sparse matrix (from Scipy.Sparse) to a Pandas sparse series. From the documentation(http://pandas.pydata.org/pandas-docs/stable/sparse.html) it says to use the command SparseSeries.from_coo(A). This seems to be OK, but when I try to see the series' attributes, this is what happens.

10x10 seems OK.

import pandas as pd 
import scipy.sparse as ss 
import numpy as np 
row = (np.random.random(10)*10).astype(int) 
col = (np.random.random(10)*10).astype(int) 
val = np.random.random(10)*10 
sparse = ss.coo_matrix((val,(row,col)),shape=(10,10)) 
pss = pd.SparseSeries.from_coo(sparse)
print pss
0  7    1.416631
   9    5.833902
1  0    4.131919
2  3    2.820531
   7    2.227009
3  1    9.205619
4  4    8.309077
6  0    4.376921
7  6    8.444013
   7    7.383886
dtype: float64
BlockIndex
Block locations: array([0])
Block lengths: array([10])

But not 100x100.

import pandas as pd 
import scipy.sparse as ss 
import numpy as np 
row = (np.random.random(100)*100).astype(int) 
col = (np.random.random(100)*100).astype(int) 
val = np.random.random(100)*100 
sparse = ss.coo_matrix((val,(row,col)),shape=(100,100)) 
pss = pd.SparseSeries.from_coo(sparse)
print pss

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-790-f0c22a601b93> in <module>()
      7 sparse = ss.coo_matrix((val,(row,col)),shape=(100,100))
      8 pss = pd.SparseSeries.from_coo(sparse)
----> 9 print pss
     10 

C:\Users\ej\AppData\Local\Continuum\Anaconda\lib\site-packages\pandas\core\base.pyc in __str__(self)
     45         if compat.PY3:
     46             return self.__unicode__()
---> 47         return self.__bytes__()
     48 
     49     def __bytes__(self):

C:\Users\ej\AppData\Local\Continuum\Anaconda\lib\site-packages\pandas\core\base.pyc in __bytes__(self)
     57 
     58         encoding = get_option("display.encoding")
---> 59         return self.__unicode__().encode(encoding, 'replace')
     60 
     61     def __repr__(self):

C:\Users\ej\AppData\Local\Continuum\Anaconda\lib\site-packages\pandas\sparse\series.pyc in __unicode__(self)
    287     def __unicode__(self):
    288         # currently, unicode is same as repr...fixes infinite loop
--> 289         series_rep = Series.__unicode__(self)
    290         rep = '%s\n%s' % (series_rep, repr(self.sp_index))
    291         return rep

C:\Users\ej\AppData\Local\Continuum\Anaconda\lib\site-packages\pandas\core\series.pyc in __unicode__(self)
    895 
    896         self.to_string(buf=buf, name=self.name, dtype=self.dtype,
--> 897                        max_rows=max_rows)
    898         result = buf.getvalue()
    899 

C:\Users\ej\AppData\Local\Continuum\Anaconda\lib\site-packages\pandas\core\series.pyc in to_string(self, buf, na_rep, float_format, header, length, dtype, name, max_rows)
    960         the_repr = self._get_repr(float_format=float_format, na_rep=na_rep,
    961                                   header=header, length=length, dtype=dtype,
--> 962                                   name=name, max_rows=max_rows)
    963 
    964         # catch contract violations

C:\Users\ej\AppData\Local\Continuum\Anaconda\lib\site-packages\pandas\core\series.pyc in _get_repr(self, name, header, length, dtype, na_rep, float_format, max_rows)
    989                                         na_rep=na_rep,
    990                                         float_format=float_format,
--> 991                                         max_rows=max_rows)
    992         result = formatter.to_string()
    993 

C:\Users\ej\AppData\Local\Continuum\Anaconda\lib\site-packages\pandas\core\format.pyc in __init__(self, series, buf, length, header, na_rep, name, float_format, dtype, max_rows)
    145         self.dtype = dtype
    146 
--> 147         self._chk_truncate()
    148 
    149     def _chk_truncate(self):

C:\Users\ej\AppData\Local\Continuum\Anaconda\lib\site-packages\pandas\core\format.pyc in _chk_truncate(self)
    158             else:
    159                 row_num = max_rows // 2
--> 160                 series = concat((series.iloc[:row_num], series.iloc[-row_num:]))
    161             self.tr_row_num = row_num
    162         self.tr_series = series

C:\Users\ej\AppData\Local\Continuum\Anaconda\lib\site-packages\pandas\tools\merge.pyc in concat(objs, axis, join, join_axes, ignore_index, keys, levels, names, verify_integrity, copy)
    752                        keys=keys, levels=levels, names=names,
    753                        verify_integrity=verify_integrity,
--> 754                        copy=copy)
    755     return op.get_result()
    756 

C:\Users\ej\AppData\Local\Continuum\Anaconda\lib\site-packages\pandas\tools\merge.pyc in __init__(self, objs, axis, join, join_axes, keys, levels, names, ignore_index, verify_integrity, copy)
    803         for obj in objs:
    804             if not isinstance(obj, NDFrame):
--> 805                 raise TypeError("cannot concatenate a non-NDFrame object")
    806 
    807             # consolidate

TypeError: cannot concatenate a non-NDFrame object

I don't really understand the error message - I think I am following the example in the documentation to the letter, just using my own COO matrix (could it be the size?)

Regards

Yeah, looks OK to me at first glance. Maybe is size related as you speculate. Does it work on smaller matrices? — JohnE, Aug 12 '15 at 18:27
Nope. See screenshot: http://imgur.com/X4d8cL5, unless you consider a 162x95 sparse matrix too large?! Do you think it could be a bug then? Thank you for your help. — Francesco, Aug 13 '15 at 08:32
No, it's not that big. Best way to trouble shoot or prove it is a bug is to post actual sample data so others can replicate. — JohnE, Aug 13 '15 at 12:07
@JohnE, thanks. OK, not sure where best to put the test code, but here it is: `import pandas as pd` `import scipy.sparse as ss` `import numpy as np` `row = (np.random.random(100)*100).astype(int)` `col = (np.random.random(100)*100).astype(int)` `val = np.random.random(100)*100` `sparse = ss.coo_matrix((val,(row,col)),shape=(100,100))` `pss = pd.SparseSeries.from_coo(sparse)` `pss` This gives me the same error. — Francesco, Aug 13 '15 at 20:58
I have only dabbled with sparse matrices so I can't say what is going on. If you don't get any suggestions here on SO, you may want to raise an issue at github: https://github.com/pydata/pandas/issues — JohnE, Aug 13 '15 at 21:39
Best thing is to put the code in the original question. I replicated the problem with your code whereas it seems to work fine for a 10x10 instead of 100x100. Ideally show both: how it works for 10x10 and not for 100x100. Actually, I'll go ahead and edit it in but please alter or add to it as you like. — JohnE, Aug 13 '15 at 21:46
I think the way you are creating the matrix allows it to have overlapping entries -- e.g. 2 different values could be mapped to row 2, column 6. I doubt that is the problem but I suspect that is not really a good way to do it either. — JohnE, Aug 13 '15 at 21:58
By default, the coo_matrix adds the values in `data` which have the same index position. This is actually a useful feature, particularly if you want to down-sample your data (you simply divide the `row` or `column` elements by your bin step). I am pretty sure this happens in my examples, so perhaps it's that... — Francesco, Aug 13 '15 at 22:20
`coo_matrix()` does not actually sum duplicate values; it just stores those 3 input arrays in its attributes (without copy or change). The summation occurs when the matrix is converted to another format such as `csr`, or when it is displayed. It may be worth trying a `sparse=sparse.tocsr().tocoo()` round trip just to cleanup any duplication. — hpaulj, Dec 09 '15 at 23:04

hpaulj · Answer 1 · 2015-12-10T01:25:38.480

I have an older pandas. It has the sparse code, but not the tocoo. The pandas issue that has been filed in connection with this is: https://github.com/pydata/pandas/issues/10818

But I found on github that:

def _coo_to_sparse_series(A, dense_index=False):
    """ Convert a scipy.sparse.coo_matrix to a SparseSeries.
    Use the defaults given in the SparseSeries constructor. """
    s = Series(A.data, MultiIndex.from_arrays((A.row, A.col)))
    s = s.sort_index()
    s = s.to_sparse()  # TODO: specify kind?
    # ...
    return s

With a smallish sparse matrix I construct and display without problems:

In [259]: Asml=sparse.coo_matrix(np.arange(10*5).reshape(10,5))
In [260]: s=pd.Series(Asml.data,pd.MultiIndex.from_arrays((Asml.row,Asml.col)))
In [261]: s=s.sort_index()
In [262]: s
Out[262]: 
0  1     1
   2     2
   3     3
   4     4
1  0     5
   1     6
   2     7
 [...  mine]
   3    48
   4    49
dtype: int32
In [263]: ssml=s.to_sparse()
In [264]: ssml
Out[264]: 
0  1     1
   2     2
   3     3
   4     4
1  0     5
  [...  mine]
   2    47
   3    48
   4    49
dtype: int32
BlockIndex
Block locations: array([0])
Block lengths: array([49])

but with a larger array (more nonzero elements) I get a display error. I'm guessing it happens when the display for the (plain) series starts to use an ellipsis (...). I'm running in Py3, so I get a different error message.

....\pandas\core\base.pyc in __str__(self)
     45         if compat.PY3:
     46             return self.__unicode__()   # py3
     47         return self.__bytes__()         # py2 route

e.g.:

In [265]: Asml=sparse.coo_matrix(np.arange(10*7).reshape(10,7))
In [266]: s=pd.Series(Asml.data,pd.MultiIndex.from_arrays((Asml.row,Asml.col)))
In [267]: s=s.sort_index()
In [268]: s
Out[268]: 
0  1     1
   2     2
   3     3
   4     4
   5     5
   6     6
1  0     7
   1     8
   2     9
   3    10
   4    11
   5    12
   6    13
2  0    14
   1    15
...
7  6    55
8  0    56
   1    57
[... mine]
Length: 69, dtype: int32
In [269]: ssml=s.to_sparse()
In [270]: ssml
Out[270]: <repr(<pandas.sparse.series.SparseSeries at 0xaff6bc0c>)
failed: AttributeError: 'SparseArray' object has no attribute '_get_repr'>

I'm not sufficiently familiar with pandas code and structures to deduce much more for now.

non-NDFFrame object error using pandas.SparseSeries.from_coo() function

1 Answers1

Linked