5

Using spacy v3, I try to train a classifier using camemBert and got CUDA out of memory problem. To resolve this issue I read that I should decrease the batch size but I'm confused which parameter should I change between :

  • [nlp] batch_size
  • [components.transformer] max_batch_items
  • [corpora.train or dev] max_length
  • [trainning.batcher] size
  • [trainning.batcher] buffer

I tried to understand the difference between each parameter :

  1. [nlp] batch_size

Default batch size for pipe and evaluate. Defaults to 1000.

Are those functions used in the training / evaluating process ?
In the quickstart widget (https://spacy.io/usage/training#quickstart), why are the value different according to the hardware ? 1000 for CPU and 128 for GPU.
During the training process, will be the evaluation slower if this value is low ?

  1. [components.transformer] max_batch_items

Maximum size of a padded batch. Defaults to 4096.

According to the warning message : Token indices sequence length is longer than the specified maximum sequence length for this model (556 > 512). Running this sequence through the model will result in indexing errors explained here (https://github.com/explosion/spaCy/issues/6939), Camembert model has a specified maximum sequence length of 512.

Is the parameter max_batch_item overloaded to this value ? Should I change the value to 512 ?

  1. [corpora.train or dev] max_length

In my comprehension, this value should be equal or lower to the maximum sequence length. In the quickstart widget this value is set to 500 for the training set and 0 for the dev set. If set to 0, will it be overloaded to the maximum sequence length of the transformer model ?

  1. [trainning.batcher] size for spacy.batch_by_padded.v1

The largest padded size to batch sequences into. Can also be a block referencing a schedule, e.g. compounding.

If I don't use compounding, in what this parameter is different of the max_lentgh ?

Here are some parts of my config file

[nlp]
lang = "fr"
pipeline = ["transformer","textcat"]
# Default batch size to use with nlp.pipe and nlp.evaluate
batch_size = 256
...

[components.transformer]
factory = "transformer"
# Maximum size of a padded batch. Defaults to 4096.
max_batch_items = 4096
...

[corpora.dev]
@readers = "spacy.Corpus.v1"
path = ${paths.dev}
# Limitations on training document length
max_length = 512
...

[training.batcher]
@batchers = "spacy.batch_by_padded.v1"
discard_oversize = true
# The largest padded size to batch sequences into. Can also be a block referencing a schedule, e.g. compounding.
size = 500
# The number of sequences to accumulate before sorting by length. A larger buffer will result in more even sizing, but if the buffer is very large, the iteration order will be less random, which can result in suboptimal training.
buffer = 128
get_length = null
...

Marien
  • 117
  • 5

2 Answers2

1

How much memory does your GPU have?

Under Spacy 2.x, I was able to use a 6GB GPU. But (if memory serves me right) Spacy 3 documentation recommends 10-12 GB. I tried various parameters, but my GPU 6GB memory is mostly used up by the PyTorch load, and hence I 'run out of GPU memory' fairly soon regardless of the batch_size tweaking. And this applies not only to transformers, but also to the plain NR EntityRecognizer - Spacy 3 simply loads GPU with much more 'stuff' than Spacy 2 used to.

mbrunecky
  • 176
  • 1
  • 6
  • I have 16GB. But for now, the only way I could resolve the OOM exception si to set discard_oversize = true – Marien Jul 27 '21 at 08:51
1

I was having the same problem of CUDA out of memory. reducing batch size under [nlp] alone did not solve the problem. I then reduced batcher size to 250 then 125, it worked. it takes a lot longer but to be expected with smaller batch size