Memory error in Google Cloud Platform AI Jupyter notebook

Question

I am trying to run a sentiment analysis on Google Cloud Platform (AI Platform). When I try to split the data into training, Its showing memory error like below error like below

MemoryError: Unable to allocate 194. GiB for an array with shape (414298,) and data type <U125872

How do I increase the memory size accordingly? Should I change the machine type in the instance? If so Which setting would be appropriate?

can you post your code? Please take a look at this one: https://stackoverflow.com/questions/57507832/unable-to-allocate-array-with-shape-and-data-type — gogasca, Feb 05 '21 at 01:51
train, test = train_test_split(data, test_size =0.2, random_state=1) X_train = train['content'].values.astype('U') X_test = test['content'].values.astype('U') y_train = train['TextBlobTargetNumerical'] y_test = test['TextBlobTargetNumerical'] — , Feb 05 '21 at 12:04
Please take a look at answer above, looks to me that your array is just too big to fit into RAM and you need to use pagination. Try enabling swap in VM — gogasca, Feb 05 '21 at 18:35

score 0 · Accepted Answer · answered Feb 04 '21 at 14:17

0

From the error it's seems the VM is out of memory.

1 - Create a new Notebook with another Machine type. For this, go to AI Platform > Notebooks and click on NEW INSTANCE. Select the option most fit you (R 3.6, Python 2 and 3, etc.) and click on ADVANCED OPTIONS in the popped pane. In the Machine Configuration area you can pick a Machine type with more memory.

Please start with n1-standard-16 or n1-highmem-8, and if any of those works, jump to n1-standard-32 or n1-highmem-16.

Using the command you also can change the machine size:

gcloud compute instances set-machine-type INSTANCE_NAME \
    --machine-type NEW_MACHINE_TYPE

2 - Change the dtype. If you are working with np.float64 type, you can change it to np.float32 in order to reduce size. So you can modify the line: result = np.empty(self.shape, dtype=dtype) By: result = np.empty(self.shape, dtype=np.float32)

If you don't want to modify your code I suggest you to follow first option.

answered Feb 04 '21 at 14:17

Mahboob

1,877
5
19

1

Thank you so much for your reply. I have 5 lakhs rows in my dataset. I just have two columns of text and target for sentiment analysis. I was able to fit only 50,000 rows in my local PC. That's the reason I switched to GCP. But I am having the same problem again. I changed my machine type to 16 CPUs and 104 GB of RAM. (n1-highmem-16). But still I am able to fit only 75,000 rows. – Feb 04 '21 at 15:21
Also, I am converting the text column as X_train = train['content'].values.astype('U') I think maybe the memory error is due to this. How to reduce the size of the text? Is there any possibility? – Feb 04 '21 at 15:33
@Nikitha JV, can you post your code snippet? – Nick_Kh Feb 05 '21 at 08:40
train, test = train_test_split(data, test_size =0.2, random_state=1) X_train = train['content'].values.astype('U') X_test = test['content'].values.astype('U') y_train = train['TextBlobTargetNumerical'] y_test = test['TextBlobTargetNumerical'] – Feb 05 '21 at 12:04
Have you tried to enable enable swap in VM as per @gogasca suggestion? – Nick_Kh Feb 08 '21 at 10:02
@NikithaJV did this solution helped? I'm having same problem. VM memory is not completely utilized but it says need more memory and i cant use more observations due to this issue. How did you manage to solve it – Shan Khan Apr 23 '22 at 21:17

Farid · Answer 2 · 2021-09-18T04:18:29.303

Changing machine type to one with enough resources is necessary but might not be sufficient. As indicated here Jupyter as a service settings need to be set to allow for greater memory usage. Make sure of this trying the following steps:

Open a terminal on your Jupyter instance and run the following command:

sudo nano /lib/systemd/system/jupyter.service

Check if the MemoryHigh and MemoryMax parameters on the text editor that prompts (like the one showed bellow) are set to your desired capacity. If not, then change them.

[Unit]
Description=Jupyter Notebook
[Service]
Type=simple
PIDFile=/run/jupyter.pid
CPUQuota=97%
MemoryHigh=3533868160
MemoryMax=3583868160
ExecStart=/bin/bash --login -c '/opt/conda/bin/jupyter lab --config=/home/jupyter/.jupyter/jupyter_notebook_config.py'
User=jupyter
Group=jupyter
WorkingDirectory=/home/jupyter
Restart=always
[Install]
WantedBy=multi-user.target`

Save and exit.

Finally, run the following command on the terminal:

echo 1 | sudo tee /proc/sys/vm/overcommit_memory

This will allow for full usage of the vm resources on the Jupyter instance.

Memory error in Google Cloud Platform AI Jupyter notebook

2 Answers2