How do i convert a pd df to a sparse matrix for a lightFM recommender without first making a memory-busting non-sparse matrix?

Question

I am building a LightFM recommender in python. I have a pandas dataframe of transactions: each row is a purchase and has a customer and product id. I realise that to build my model I need to convert this to a sparse matrix with 1 row per customer and one column per product, with a 1 when they purhcased and a 0 otherwise. I have about 10 million rows.

My code below first pivots my df to the right shape for a matrix, fills in 0s, then converts to a sparse matrix.

The problem is that when I pivot my df, the resulting matrix is understandably quite big. Far too big to fit in memory!

Is there any way I can skip this step, or perhaps do some sort of lazy evaluation that allows me to avoid creating this giant memory busting matrix in memory?

from scipy.sparse import csr_matrix, coo_matrix
import pandas a pd

train_matrix = df.pivot(
    index="customer_id columns="product_id", values=1
) 

train_matrix.fillna(0, inplace = True)

sparse_train_matrix = (
    train_matrix.astype(pd.SparseDtype("float64", 0)).sparse.to_coo().tocsr()
)

How do i convert a pd df to a sparse matrix for a lightFM recommender without first making a memory-busting non-sparse matrix?

0 Answers0