On Jupyter Notebook I am trying to know the best number of clusters of KMeans function by getting the silhouette score.
df = pd.read_csv('single_family_home_values.csv')
X = df.drop('estimated_value', axis=1)
X = X[['bedrooms','bathrooms','rooms','squareFootage','lotSize','yearBuilt','priorSaleAmount']]
X.fillna(0, inplace=True)
for i in range(3, 10):
kmean = KMeans(n_clusters=i).fit(X)
labels = kmean.labels_
print(silhouette_score(X,labels))
But it outputs a Memory error
MemoryError Traceback (most recent call last)
<ipython-input-36-ca9617d8baf5> in <module>
2 kmean = KMeans(n_clusters=i).fit(X)
3 labels = kmean.labels_
----> 4 print(silhouette_score(X,labels))
MemoryError: Unable to allocate 1.00 GiB for an array with shape (8947, 15000) and data type float64
I have tried closing all tabs and windows and I gave Jupyter Notebook administrator privileges. It was suggested in another question that this is due to the system's overcommit handling mode, but I don't know how to change the overcommit handling mode on Windows and I have read a lot of times that changing it is not recommended. I tried to increase the max buffer size to 4 GB and I have increased the pagefile initial size to 24000 MB and maximum size to 72000 MB, but these solutions did not work.
I have tried running the file in VSC instead and the same error occured.
I am using 64-bit Windows 10 and 16.0 GB of RAM
Here is the csv file: single_family_home_values.csv
Here is the Jupyter config file: jupyter_notebook_config.py