i have data set that contain 8 columns with 1482531 rows for every column
i try to make content based rcomondation system by
making cosine similarities using linear_kernel in python
but after half of hour it till me error memory
are this due to large of data set , and if that is their a solution to solve this issue
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import linear_kernel
from sklearn.model_selection import train_test_split
dataset = pd.read_csv('C:/data2/train.tsv',sep='\t', low_memory=False)
tf = TfidfVectorizer(analyzer='word', ngram_range=(1, 3), min_df=0, stop_words='english')
dataset['item_description'] = dataset['item_description'].fillna('')
tfidf_matrix.shape
((1482535, 13831759))
cosine_similarities = linear_kernel(tfidf_matrix, tfidf_matrix)