Fine tuning multiclass multilabel wav2vec2 model with transformers

Question

I have managed to adapt the HuggingFace audio classification tutorial to my own dataset:

https://github.com/mirix/messaih/blob/main/charts/fine_tune_w2v.py

I can now fine-tune a wav2vec model on my dataset. I am currently fine tuning a classifier on the sentiment label.

However, the dataset contains 6 other labels for emotion.

Each label, can have up to 15 different classes.

The question is how to train a model using the six emotion labels as target simultaneously.

Would it be possible to group all six labels as a list or an array and use that as a single target?

I have found a few old posts and articles providing some pointers but I am not sure how up to date they are and I do not really understand the proposed solutions.

Any hints would be most appreciated.

By 6 other labels for emotion, you mean those besides sentiment? so 6 columns, but where are these 15 different classes? I see floats corresponding to each column. — prijatelj, Aug 25 '23 at 05:28
If you could clarify the task, that'd help others understand the format of things for training. The floats for each emotion does not sum to 1 across all 6 (by row), so I interpret these as not mutually exclusive. If that's the case, then this appears to be a multi-task problem, so you'd have 6 outputs, 1 per emotion, each of which would be regression. To your question, if you want the most likely emotion for the given audio input, then you can simply take argmax across those 6, and make a new column with that as the label. Then its just a multiclass classification task. — prijatelj, Aug 25 '23 at 05:34
@prijatelj You are absolutely right. The scores are not mutually exclusive, you can think about them as the "intensity" of the respective emotion from 0 to 3. So you could have, for instance, anger = 0.65 and disgust = 1.35 while all the others being zero. Even though their are floats, however, they are discrete and therefore they can be seen as categories. I said there were 15 categories for each emotion, but I could reduce them to 4-7 in case there is not enough data for prediction. — mirix, Aug 25 '23 at 12:39
Okay, I understand your task now. Typical multi-label classification appears to binary (learned that last night), rather than multilabel. If your case fits that instead of 4-7 classes or regression to a vector, then Jun H seems to answer your question. I'd have to think more on torch implementation for multi-label w/ multiclasses per emotion, or regression per. — prijatelj, Aug 25 '23 at 14:49

score 0 · Answer 1 · answered Aug 25 '23 at 06:25

Multiclass Classification

If the intent is to make the model perform the following multiclass classification task, where the output is only one integer corresponding to one class, and all possible classes are mutually exclusive (thus output can only be one class at a time)

Schematic

input -> model -> output
------------------------
audio -> model -> most likely emotion class of 6

Then you can simply take that data table, and calculate the argmax of values across the 6 emotion columns

import numpy as np

# Making example data
dat = np.random.rand(4,6)
ex = np.vstack([dat, np.zeros(6)])

# Find all rows with the same values in each column (e.g. all zeros, no argmax)
idx_of_all_same = (ex == ex[:, 0:1]).all(1).reshape(-1,1)

# Argmax across all emotion columns to find most likely emotion
label_encs = ex.argmax(1).reshape(-1,1)

# Assign a unique value for when emotions all the same value (no argmax, defaults to zero)
label_encs[idx_of_all_same] = 7

# Append the label_encs as a new column to the original data
new_ex = np.hstack([ex, label_encs])

# If pandas dataframe
# df['Emotion Index'] = label_encs

These integers can now serve as the class label that your classifier will predict.

Multi-label Classification

Multilabel classification is different from multiclass classification in that it has multiple outputs, each as their own (potentially multiclass) classification output. In your case, the most outputs you'd have would be 6, one for each emotion, and the output labels would either be continuous for regression, to specify the strength of the emotion, or would be logistic regression for binary classification: 0 for none of that emotion and 1 for the presence of that emotion.

If subsets of emotions were mutually exclusive, then you'd have less than 6 outputs, and the outputs that correspond to those that are mutually exclusive would be multiclass classification outputs.

This would probably be trained as a multi-task problem, and typically is more nuanced to code up correctly, and to get it to train well.

Fine tuning multiclass multilabel wav2vec2 model with transformers

1 Answers1

Multiclass Classification

Multi-label Classification