This is the information of the pandas dataframe I am working with:
Input:
final.info()
print(final)
Output:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 410978 entries, 0 to 410977
Data columns (total 3 columns):
Column Non-Null Count Dtype
--- ------ -------------- -----
0 customer_id 410978 non-null int64
1 item_id 410978 non-null object
2 item_qty 410978 non-null int64
dtypes: int64(2), object(1)
memory usage: 9.4+ MB
customer_id item_id item_qty
0 12346 15056BL 1
1 12346 15056N 1
2 12346 15056P 1
3 12346 20679 1
4 12346 20682 1
... ... ... ...
410973 18287 85040B 12
410974 18287 85041 12
410975 18287 85199S 24
410976 18287 85232B 24
410977 18287 C2 1
[410978 rows x 3 columns]
I am trying to make a sparse matrix from this dataframe using the following code:
#Get all the unique users for the sparse matrix rows
customers = list(np.sort(final.customer_id.unique()))
#Get all the uniquee products for the sparse matrix columns
products = list(final.item_id.unique())
#Get all the purchases
quantity = list(final.item_qty)
#Get the row indices
cat_type = CategoricalDtype(categories=customers, ordered=True)
rows = final.customer_id.astype(cat_type)
#Get the column indices
cat_type1 = CategoricalDtype(categories=products, ordered=True)
cols = final.item_id.astype(cat_type1)
#Create the sparse matrix
sparse_purchase = sparse.csr_matrix((quantity,(rows,cols)), shape=(len(customers), len(products)))
Error:
ValueError Traceback (most recent call last)
<ipython-input-47-058d94e47e42> in <module>()
17
18 #Create the sparse matrix
---> 19 sparse_purchase = sparse.csr_matrix((quantity,(rows,cols)), shape=(len(customers), len(products)))
5 frames
/usr/local/lib/python3.7/dist-packages/numpy/core/_asarray.py in asarray(a, dtype, order)
81
82 """
---> 83 return array(a, dtype, copy=False, order=order)
84
85
ValueError: invalid literal for int() with base 10: '15056BL'
I think sparse.csr_matrix requires integer inputs for the rows and columns, but since item_id is an object datatype I am not being able to create the sparse matrix. Converting item_id to int is also not possible. Is there some way to make a sparse matrix with object datatype?
Any help would be appreciated.