3

In the official OpenAI node library Create embeddings if for example using the model text-embedding-ada-002 the embeddings returned is an array of around 1536.

import {Configuration, OpenAIApi} from 'openai'

openai = new OpenAIApi(this.configuration)
const parameters= {
      model: 'text-embedding-ada-002',
      input: text,
    }
// Make the embedding request and return the result
const resp = await openai.createEmbedding(parameters)
const embeddings = embedding?.data.data[0].embedding

I would like to be able to limit the length of the list of embeddings returned.

Talha Tayyab
  • 8,111
  • 25
  • 27
  • 44
Nadsah
  • 109
  • 1
  • 9

1 Answers1

1

You can't change the embedding output dimension. The OpenAI Embeddings API doesn't have a parameter to control this. If you use the text-embedding-ada-002 model, you'll always get a 1536-dimensional embedding vector (i.e., there are 1536 numbers inside).

It's pre-defined, as stated in the official OpenAI documentation:

Screenshot

Note: You don't get 1536 embeddings from the OpenAI Embeddings API. You get one(!) 1536-dimensional embedding. What you can try to do is translate the embedding you get from the OpenAI Embeddings API to a lower-dimensional space. You'll have to do it manually.

Rok Benko
  • 14,265
  • 2
  • 24
  • 49
  • 1
    Okay thank you, for the clarification! so if I want a Limited numbers of embeddings I will have to use some other tokenizer? – Nadsah Mar 14 '23 at 13:46
  • Yes, you need to use any other Python library where you have an option to choose the embedding output dimension. [This](https://www.geeksforgeeks.org/python-word-embedding-using-word2vec/) looks promising. Seems like you can set `vector_size`. – Rok Benko Mar 14 '23 at 13:55
  • Is there a way to take the top n most relevant embeddings out of the 1536 in the output dimension? – kumail Apr 28 '23 at 02:50
  • @kumail You can't simply "take the top n most relevant embeddings out of the 1536". **You don't get 1536 embeddings from the OpenAI Embeddings API. You get one(!) 1536-dimensional embedding.** Anyway, what you can try to do is translate the embedding you get from the OpenAI Embeddings API to a lower-dimensional space. I edited my answer. – Rok Benko Apr 28 '23 at 06:12