What are the differences between adapter tuning and prefix tuning?

Question

I am trying to understand the concept of adapter-tuning, prompt-tuning, and prefix-tuning in the context of few-shot learning.

It appears to me that I can apply prompt tuning to a black box language model.

I read for prompt tuning the entire pre-trained language model is frozen. If that's the case prompt tuning could be applied for an OpenAI model like gpt-3 and Codex.

How could I do prompt tuning with OpenAI Codex? I don't find any way so far.

How these techniques are different than in-context example that could be given by few-shot learning.

Can anyone please guide me in the correct direction?

Your question is off-topic for Stack Overflow. As noted in the [machine-learning](https://stackoverflow.com/tags/machine-learning/info) tag, only implementation questions are on topic. More general questions about approaches, theory or methodology should be directed to a more appropriate venue. — David Buck, Dec 12 '22 at 21:29
@DavidBuck I am asking here how to prompt tune the Codex model. This is an implementation question in my humble opinion. Please help. I also added a bounty. — Exploring, Dec 13 '22 at 01:45

score 1 · Answer 1 · answered Dec 15 '22 at 08:58

In my understanding all three concepts mentioned are based on a pre-trained model so in general should work with the GPT model that is molded within OpenAI Codex.

Adapter-tuning involves adding small, task-specific "adapter" modules to the pre-trained model, which can be trained on a few examples to improve performance on the specific task. This is especially interesting in case you want to do task adaptation in my opinion. The idea is to horizontally extend the model by additional layers. You are touching theta.

Prompt-tuning involves providing the model with a few examples of the desired output, along with a prompt indicating the task that the model should perform. You can also read up on this looking for cues or priors. Intuitively this can be understood in guiding the model explicitly. The idea is to add prior knowledge through the input. You are touching x.

Prefix-tuning involves providing the model with a few examples of text inputs, along with a prefix that indicates the task that the model should perform. In my understanding this is basically prompt tuning but focusses on the specifics of natural language processing. The idea is to add prior knowledge through the input. You are touching x.

In their paper on OpenAI Codex they explain how they did fine-tune and adapt their GPT model to the GitHub Data they use for copilot. Read it here.

And this is an open source project which tries to replicate OpenAI Codex - gets pretty close to what you are trying to do, if I understood your comment correctly.

thanks for the response. But this does not answer the question how they could be applied to OpenAI Codex or gpt-code-clippy? Also how are prompt-tuning or prefix-tuning different from few-shot learning? — Exploring, Dec 16 '22 at 04:53
Few-shot learning is the superordinate categegory to the approaches explained above. You would need to start with at least some minimal viable code snippet to be able to follow up on your implementation request. — mrk, Dec 16 '22 at 07:27
also feel free to turn to the open source project I hinted at - for starters. They even link their whole training scripts and have a dedicated section in their readme for that. — mrk, Dec 16 '22 at 18:41
the link to gpt-code-clippy mention neither prompt-tuning nor prefix-tuning. I am not asking about exact implementation. Conceptually how would someone implement that? Any hint welcome. — Exploring, Dec 17 '22 at 19:48
Conceptually how would someone implement that is a pretty large-scope question. At this point a hint to "papers with code" https://paperswithcode.com/paper/prefix-tuning-optimizing-continuous-prompts might help you find what you are looking for. — mrk, Dec 20 '22 at 10:19

score 0 · Accepted Answer · answered Dec 17 '22 at 19:53

These are alternatives to fine-tuning model. They are essentially solutions that reside between few-shot learning and complete fine-tuning of models.

The other answer in this SO post is completely wrong. Fine-tuning has nothing to do with neither prompt tuning nor prefix tuning. These two are completely different techniques than fine-tuning.

Correct reference to prompt tuning and prefix tuning are given below:

Prompt Tuning: For prompt tuning k learnable parameter i.e. continuous token embeddings is appended to the input. But the entire pre-trained language model is frozen.
Prefix Tuning: For k positions prepended to the input, concatenate additional learnable weights for keys and values at every attention layer. Different to prompt tuning (only learnable input vectors).

Papers that introduced these techniques are given below:

Prompt Tuning: https://aclanthology.org/2021.emnlp-main.243/
Prefix-Tuning: https://arxiv.org/abs/2101.00190

What are the differences between adapter tuning and prefix tuning?

2 Answers2