I want to understand how to work with sparse matrices. I have this code to generate multi-label classification data set as a sparse matrix.
from sklearn.datasets import make_multilabel_classification
X, y = make_multilabel_classification(sparse = True, n_labels = 20, return_indicator = 'sparse', allow_unlabeled = False)
This code gives me X in the following format:
<100x20 sparse matrix of type '<class 'numpy.float64'>'
with 1797 stored elements in Compressed Sparse Row format>
y:
<100x5 sparse matrix of type '<class 'numpy.int64'>'
with 471 stored elements in Compressed Sparse Row format>
Now I need to split X and y into X_train, X_test, y_train and y_test, so that train set consitutes 70%. How can I do it?
This is what I tried:
X_train, X_test, y_train, y_test = train_test_split(X.toarray(), y, stratify=y, test_size=0.3)
and got the error message:
TypeError: A sparse matrix was passed, but dense data is required. Use X.toarray() to convert to a dense numpy array.