Using spacy v3, I try to train a classifier using camemBert and got CUDA out of memory
problem.
To resolve this issue I read that I should decrease the batch size but I'm confused which parameter should I change between :
- [nlp] batch_size
- [components.transformer] max_batch_items
- [corpora.train or dev] max_length
- [trainning.batcher] size
- [trainning.batcher] buffer
I tried to understand the difference between each parameter :
- [nlp] batch_size
Default batch size for pipe and evaluate. Defaults to 1000.
Are those functions used in the training / evaluating process ?
In the quickstart widget (https://spacy.io/usage/training#quickstart), why are the value different according to the hardware ? 1000 for CPU and 128 for GPU.
During the training process, will be the evaluation slower if this value is low ?
- [components.transformer] max_batch_items
Maximum size of a padded batch. Defaults to 4096.
According to the warning message : Token indices sequence length is longer than the specified maximum sequence length for this model (556 > 512). Running this sequence through the model will result in indexing errors
explained here (https://github.com/explosion/spaCy/issues/6939),
Camembert model has a specified maximum sequence length of 512.
Is the parameter max_batch_item overloaded to this value ? Should I change the value to 512 ?
- [corpora.train or dev] max_length
In my comprehension, this value should be equal or lower to the maximum sequence length. In the quickstart widget this value is set to 500 for the training set and 0 for the dev set. If set to 0, will it be overloaded to the maximum sequence length of the transformer model ?
- [trainning.batcher] size for spacy.batch_by_padded.v1
The largest padded size to batch sequences into. Can also be a block referencing a schedule, e.g. compounding.
If I don't use compounding, in what this parameter is different of the max_lentgh ?
Here are some parts of my config file
[nlp]
lang = "fr"
pipeline = ["transformer","textcat"]
# Default batch size to use with nlp.pipe and nlp.evaluate
batch_size = 256
...
[components.transformer]
factory = "transformer"
# Maximum size of a padded batch. Defaults to 4096.
max_batch_items = 4096
...
[corpora.dev]
@readers = "spacy.Corpus.v1"
path = ${paths.dev}
# Limitations on training document length
max_length = 512
...
[training.batcher]
@batchers = "spacy.batch_by_padded.v1"
discard_oversize = true
# The largest padded size to batch sequences into. Can also be a block referencing a schedule, e.g. compounding.
size = 500
# The number of sequences to accumulate before sorting by length. A larger buffer will result in more even sizing, but if the buffer is very large, the iteration order will be less random, which can result in suboptimal training.
buffer = 128
get_length = null
...