2

I am trying to run a sentiment analysis on Google Cloud Platform (AI Platform). When I try to split the data into training, Its showing memory error like below error like below

MemoryError: Unable to allocate 194. GiB for an array with shape (414298,) and data type <U125872

How do I increase the memory size accordingly? Should I change the machine type in the instance? If so Which setting would be appropriate?

  • 1
    can you post your code? Please take a look at this one: https://stackoverflow.com/questions/57507832/unable-to-allocate-array-with-shape-and-data-type – gogasca Feb 05 '21 at 01:51
  • train, test = train_test_split(data, test_size =0.2, random_state=1) X_train = train['content'].values.astype('U') X_test = test['content'].values.astype('U') y_train = train['TextBlobTargetNumerical'] y_test = test['TextBlobTargetNumerical'] –  Feb 05 '21 at 12:04
  • Please take a look at answer above, looks to me that your array is just too big to fit into RAM and you need to use pagination. Try enabling swap in VM – gogasca Feb 05 '21 at 18:35

2 Answers2

0

From the error it's seems the VM is out of memory.

1 - Create a new Notebook with another Machine type. For this, go to AI Platform > Notebooks and click on NEW INSTANCE. Select the option most fit you (R 3.6, Python 2 and 3, etc.) and click on ADVANCED OPTIONS in the popped pane. In the Machine Configuration area you can pick a Machine type with more memory.

Please start with n1-standard-16 or n1-highmem-8, and if any of those works, jump to n1-standard-32 or n1-highmem-16.

Using the command you also can change the machine size:

gcloud compute instances set-machine-type INSTANCE_NAME \
    --machine-type NEW_MACHINE_TYPE

2 - Change the dtype. If you are working with np.float64 type, you can change it to np.float32 in order to reduce size. So you can modify the line: result = np.empty(self.shape, dtype=dtype) By: result = np.empty(self.shape, dtype=np.float32)

If you don't want to modify your code I suggest you to follow first option.

Mahboob
  • 1,877
  • 5
  • 19
  • 1
    Thank you so much for your reply. I have 5 lakhs rows in my dataset. I just have two columns of text and target for sentiment analysis. I was able to fit only 50,000 rows in my local PC. That's the reason I switched to GCP. But I am having the same problem again. I changed my machine type to 16 CPUs and 104 GB of RAM. (n1-highmem-16). But still I am able to fit only 75,000 rows. –  Feb 04 '21 at 15:21
  • Also, I am converting the text column as X_train = train['content'].values.astype('U') I think maybe the memory error is due to this. How to reduce the size of the text? Is there any possibility? –  Feb 04 '21 at 15:33
  • @Nikitha JV, can you post your code snippet? – Nick_Kh Feb 05 '21 at 08:40
  • train, test = train_test_split(data, test_size =0.2, random_state=1) X_train = train['content'].values.astype('U') X_test = test['content'].values.astype('U') y_train = train['TextBlobTargetNumerical'] y_test = test['TextBlobTargetNumerical'] –  Feb 05 '21 at 12:04
  • Have you tried to enable enable swap in VM as per @gogasca suggestion? – Nick_Kh Feb 08 '21 at 10:02
  • @NikithaJV did this solution helped? I'm having same problem. VM memory is not completely utilized but it says need more memory and i cant use more observations due to this issue. How did you manage to solve it – Shan Khan Apr 23 '22 at 21:17
0

Changing machine type to one with enough resources is necessary but might not be sufficient. As indicated here Jupyter as a service settings need to be set to allow for greater memory usage. Make sure of this trying the following steps:

  1. Open a terminal on your Jupyter instance and run the following command:
sudo nano /lib/systemd/system/jupyter.service
  1. Check if the MemoryHigh and MemoryMax parameters on the text editor that prompts (like the one showed bellow) are set to your desired capacity. If not, then change them.
[Unit]
Description=Jupyter Notebook
[Service]
Type=simple
PIDFile=/run/jupyter.pid
CPUQuota=97%
MemoryHigh=3533868160
MemoryMax=3583868160
ExecStart=/bin/bash --login -c '/opt/conda/bin/jupyter lab --config=/home/jupyter/.jupyter/jupyter_notebook_config.py'
User=jupyter
Group=jupyter
WorkingDirectory=/home/jupyter
Restart=always
[Install]
WantedBy=multi-user.target`
  1. Save and exit.

Finally, run the following command on the terminal:

echo 1 | sudo tee /proc/sys/vm/overcommit_memory

This will allow for full usage of the vm resources on the Jupyter instance.

Farid
  • 447
  • 8
  • 11