I'm developing tooling based on pandas DataFrame objects. I would like to keep scipy sparse matrices around as column of a DataFrame without converting it row-wise to a list / numpy array of dtype('O').
The snippet below doesn't work as pandas treats the matrix as a scalar, and suggests to add an index. When providing a pd.RangeIndex over the row indices in the matrix, the matrix gets repeated for every row in the dataframe (as pandas thinks it is a scalar).
ma = scipy.sparse.rand(10, 100, 0.1, 'csr', dtype=np.float64)
df = pd.DataFrame(dict(X=ma))
This does work:
df = pd.DataFrame(dict(X=list(ma)))
However, this cuts up the matrix row-wise into CSR matrices each of 1 row. Which I would then need to vstack everytime I'd want to work on the original matrix.
Any pointers? I tried wrapping the CSR matrix into a pd.Series object, pretending it has dtype('O'), but I run into a lot of assumptions on the underlying data being numpy arrays and such.