1

I have a dataframe with two columns, both of type int64. I'm trying to convert my pandas dataframe into a scipy csr_matrix using the following lines of code:

s = all_raw[['a','b']] // my dataframe two two columns of type int64
t1 = s.as_matrix(columns = None)

t2 = scipy.sparse.csr_matrix(t1)

This is how t1 looks like

array([[3, 1],
   [3, 0],
   [1, 1],
   ..., 
   [1, 1],
   [2, 0],
   [2, 1]], dtype=object)

I'm getting the following error message

../anaconda/envs/python3/lib/python3.6/site-packages/scipy/sparse/sputils.py in upcast(*args)
 49             return t
 50 
---> 51     raise TypeError('no supported conversion for types: %r' % (args,))
 52 
 53 

TypeError: no supported conversion for types: (dtype('O'),)

What's going on wrong here?

HHH
  • 6,085
  • 20
  • 92
  • 164
  • https://stackoverflow.com/questions/20459536/convert-pandas-dataframe-to-sparse-numpy-matrix-directly – Gabriel A Dec 06 '17 at 01:07
  • Show some contents of t1 and check the type of those. – sascha Dec 06 '17 at 01:08
  • 1
    @sascha I changed my original post and include the content of t1. Please have a look – HHH Dec 06 '17 at 01:17
  • You got the wrong type. It's object vs. int. Don't know why (not shown in question), but you can probably do ```t1 = t1.astype(int)```. If that fails, there are some crazy non-castable objects within not shown in the trimmed output. – sascha Dec 06 '17 at 01:17
  • that worked, thanks. It's weird. When I do dtypes(all_raw), it shows the type for those two column as 'int64', not sure why they become object when I convert it to numpy. Any idea? – HHH Dec 06 '17 at 01:24
  • No sorry. Not much pandas for me these days. I think their docs or some code-reading should suffice. And as i somewhat dislike questions answered by comments, i recommend, that you are precisely describing this observation as edit and ask specifically for the reason for this! Someone might be able to help. (and also add your pandas version) – sascha Dec 06 '17 at 01:30
  • I mean ll_raw.dtypes – HHH Dec 06 '17 at 01:30
  • Pandas converts columns or the whole dataframe to object dtype if it contains mixed types. `None` in particular can only be represented as an object, not as an integer or `nan`. `pandas` has its own sparse format, and now has a means of converting it to `scipy` sparse. – hpaulj Dec 06 '17 at 04:16

0 Answers0