3

On Jupyter Notebook I am trying to know the best number of clusters of KMeans function by getting the silhouette score.

df = pd.read_csv('single_family_home_values.csv')

X = df.drop('estimated_value', axis=1)

X = X[['bedrooms','bathrooms','rooms','squareFootage','lotSize','yearBuilt','priorSaleAmount']]

X.fillna(0, inplace=True)

for i in range(3, 10):
    kmean = KMeans(n_clusters=i).fit(X)
    labels = kmean.labels_
    print(silhouette_score(X,labels))

But it outputs a Memory error

MemoryError                               Traceback (most recent call last)
<ipython-input-36-ca9617d8baf5> in <module>
      2     kmean = KMeans(n_clusters=i).fit(X)
      3     labels = kmean.labels_
----> 4     print(silhouette_score(X,labels))

MemoryError: Unable to allocate 1.00 GiB for an array with shape (8947, 15000) and data type float64

I have tried closing all tabs and windows and I gave Jupyter Notebook administrator privileges. It was suggested in another question that this is due to the system's overcommit handling mode, but I don't know how to change the overcommit handling mode on Windows and I have read a lot of times that changing it is not recommended. I tried to increase the max buffer size to 4 GB and I have increased the pagefile initial size to 24000 MB and maximum size to 72000 MB, but these solutions did not work.

I have tried running the file in VSC instead and the same error occured.

I am using 64-bit Windows 10 and 16.0 GB of RAM

Here is the csv file: single_family_home_values.csv

Here is the Jupyter config file: jupyter_notebook_config.py

Mark Saleh
  • 41
  • 9
  • Does this answer your question? [How to increase Jupyter notebook Memory limit?](https://stackoverflow.com/questions/57948003/how-to-increase-jupyter-notebook-memory-limit) – Ezra Jul 05 '21 at 19:49
  • @Ezra I have typed `jupyter notebook --NotebookApp.max_buffer_size=<4000000000>` in cmd, but the syntax of the command is incorrect. I created jupyter_notebook_config.py file by typing `jupyter notebook --generate-config` in cmd. I tried to change the max_buffer_size in the python file, but `NameError: name 'c' is not defined ` ocurred. I have spent hours searching for a solution, but I could not find any. – Mark Saleh Jul 06 '21 at 00:57
  • What was the syntax error you got? Also, please post your config. – Ezra Jul 06 '21 at 04:31
  • I got `The syntax of the command is incorrect.` in cmd. I am sorry, but I don't know how to post files here. I am new here. – Mark Saleh Jul 06 '21 at 11:14
  • You can post it like you did your csv. How do you normally run your notebook? – Ezra Jul 06 '21 at 15:16
  • Oh, also: I’m pretty sure you should run `jupyter notebook --NotebookApp.max_buffer_size=4000000000` – Ezra Jul 06 '21 at 15:18
  • Yes, your syntax worked, but it still did not solve the problem. Here is the output `[W 2021-07-06 21:28:45.784 LabApp] 'max_buffer_size' has moved from NotebookApp to ServerApp. This config will be passed to ServerApp. Be sure to update your config before our next release.`. – Mark Saleh Jul 06 '21 at 19:44
  • I usually open Jupyter notebook using Anaconda command prompt. I simply type `jupyter notebook` – Mark Saleh Jul 06 '21 at 19:45
  • By the way all lines in the config file are commented. I am waiting the file to be uploaded to GitHub and I will post it here. – Mark Saleh Jul 06 '21 at 20:32
  • does it still crash when you run it with `jupyter notebook --NotebookApp.max_buffer_size=4000000000`? – Ezra Jul 06 '21 at 23:39
  • Yes. It outputs the same error. – Mark Saleh Jul 07 '21 at 00:02
  • Here is the config file: [link](https://github.com/Mark-S2004/.jupyter/blob/main/jupyter_notebook_config.py) – Mark Saleh Jul 13 '21 at 14:48
  • try maybe `fit_predict()` instead of `fit()` – B.Kocis Jul 14 '21 at 11:44
  • Nope, this did not work. – Mark Saleh Jul 14 '21 at 20:32

0 Answers0