-2

I am learning huggingface transformer model and looked at this course.

I saw some code samples as below and did some additional google search and realized things like below could be done. I decided to explore the documentation and the google search after searching TFAutoModelForSequenceClassification returned this page

but that page doesn't explain methods/functions/properties such as model.summary() or model2.layers[0].trainable. It doesn't even mention that there is a parameter num_labels in the TFAutoModelForSequenceClassification.from_pretrained. I realized these thing because my other google searches returned those

Is there any better documentation where I could understand such things? I am finding it incredibly difficult to understand what are the different things that I could do

!pip install datasets transformers[sentencepiece]

checkpoint = "bert-base-uncased"
from transformers import TFAutoModelForSequenceClassification

model = TFAutoModelForSequenceClassification.from_pretrained(checkpoint, num_labels=2)

model.summary()

this returns

All model checkpoint layers were used when initializing TFBertForSequenceClassification.

Some layers of TFBertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Model: "tf_bert_for_sequence_classification_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 bert (TFBertMainLayer)      multiple                  109482240 
                                                                 
 dropout_75 (Dropout)        multiple                  0         
                                                                 
 classifier (Dense)          multiple                  1538      
                                                                 
=================================================================
Total params: 109,483,778
Trainable params: 109,483,778
Non-trainable params: 0
_________________________________________________________________

Then

# (optional) freeze bert layer
model2=model
model2.layers[0].trainable = False
model2.summary()

returns

Model: "tf_bert_for_sequence_classification_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 bert (TFBertMainLayer)      multiple                  109482240 
                                                                 
 dropout_75 (Dropout)        multiple                  0         
                                                                 
 classifier (Dense)          multiple                  1538      
                                                                 
=================================================================
Total params: 109,483,778
Trainable params: 1,538
Non-trainable params: 109,482,240
_________________________________________________________________
user2543622
  • 5,760
  • 25
  • 91
  • 159

1 Answers1

1

Unfortunately, I believe this is an inherent limitation of Python's design as a programming language, especially the use of args and kwargs.

This means that you will not directly find an explanation of these parameters of the instantiated class itself, and likely will just pass this on to some super class. In the case of transformers, there is some help in that it can take any attribute that is set in the config itself. If you read the docs for kwargs, you can find the following description (bold highlight by myself):

kwargs (additional keyword arguments, optional) — Can be used to update the configuration object (after it being loaded) and initiate the model (e.g., output_attentions=True). Behaves differently depending on whether a config is provided or automatically loaded:

  • If a configuration is provided with config, **kwargs will be directly passed to the underlying model’s init method (we assume all relevant updates to the configuration have already been done)
  • If a configuration is not provided, kwargs will be first passed to the configuration class initialization function PretrainedConfig.from_pretrained. Each key of kwargs that corresponds to a configuration attribute will be used to override said attribute with the supplied kwargs value. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model’s init function.

As for the other two questions (model.summary() and trainable(): These are both functions which are inherited from Keras classes, and therefore you should be able to find plenty of information on their respective docs, e.g., here for model.summary().

dennlinger
  • 9,890
  • 1
  • 42
  • 63
  • When I do `type(model)` I get `transformers.models.bert.modeling_tf_bert.TFBertForSequenceClassification `. How did you figure out that we need to look into keras documentation? – user2543622 Dec 31 '21 at 14:26
  • The class [`TFPreTrainedModel`](https://github.com/huggingface/transformers/blob/v4.15.0/src/transformers/modeling_tf_utils.py#L647), which is a parent class of `TFBertForSequenceClassification`, inherits from the Keras class. – dennlinger Jan 01 '22 at 18:26
  • 1
    For 2), you can check the type of your loaded model, and then check (for example) the [implementation on Github](https://github.com/huggingface/transformers/blob/master/src/transformers/models/bert/modeling_tf_bert.py#L1685), which allows you to go to the definition of parent classes. Eventually, you should arrive at the mentioned class. As for 1), the question IMO just says "transformer docs are bad, are there better ones", which technically is even off-topic for SO, so I stand by my vote. – dennlinger Jan 02 '22 at 07:46