I am building a LightFM recommender in python. I have a pandas dataframe of transactions: each row is a purchase and has a customer and product id. I realise that to build my model I need to convert this to a sparse matrix with 1 row per customer and one column per product, with a 1 when they purhcased and a 0 otherwise. I have about 10 million rows.
My code below first pivots my df to the right shape for a matrix, fills in 0s, then converts to a sparse matrix.
The problem is that when I pivot my df, the resulting matrix is understandably quite big. Far too big to fit in memory!
Is there any way I can skip this step, or perhaps do some sort of lazy evaluation that allows me to avoid creating this giant memory busting matrix in memory?
from scipy.sparse import csr_matrix, coo_matrix
import pandas a pd
train_matrix = df.pivot(
index="customer_id columns="product_id", values=1
)
train_matrix.fillna(0, inplace = True)
sparse_train_matrix = (
train_matrix.astype(pd.SparseDtype("float64", 0)).sparse.to_coo().tocsr()
)