Why does 'dimension' mean several different things in the machine-learning world?

Question

I've noticed that AI community refers to various tensors as 512-d, meaning 512 dimensional tensor, where the term 'dimension' seems to mean 512 different float values in the representation for a single datapoint. e.g. in 512-d word-embeddings means 512 length vector of floats used to represent 1 english-word e.g. https://medium.com/@jonathan_hui/nlp-word-embedding-glove-5e7f523999f6

But it isn't 512 different dimensions, it's only 1 dimensional vector? Why is the term dimension used in such a different manner than usual?

When we use the term conv1d or conv2d which are convolutions over 1-dimension and 2-dimensions, a dimension is used in the typical way it's used in math/sciences but in the word-embedding context, a 1-d vector is said to be a 512-d vector, or am I missing something?

Why is this overloaded use of the term dimension? What context determines what dimension means in machine-learning as the term seems overloaded?

Many terms are "overloaded" in that sense and so is almost every symbol throughout the scientific world. There's only a finite amount of useful words for each context, so at some point overloading becomes unavoidable. By the way, Numpy uses the term `shape` and PyTorch uses `size`, to denote the `n` of a n-tensor. — a_guest, Jun 15 '20 at 19:26

kmario23 · Accepted Answer · 2020-06-16T01:58:07.440

In the context of word embeddings in neural networks, dimensionality reduction, and many other machine learning areas, it is indeed correct to call the vector (which is typically, an 1D array or tensor) as n-dimensional where n is usually greater than 2. This is because we usually work in the Euclidean space where a (data) point in a certain dimensional (Euclidean) space is represented as an n-tuple of real numbers (i.e. real n-space ℝⁿ).

Below is an example^ref of a (data) point in a 3D (Euclidean) space. To represent any point in this space, say d₁, we need a tuple of three real numbers (x₁, y₁, z₁).

Now, your confusion arises why this point d₁ is called as 3 dimensional instead of 1 dimensional array. The reason is because it lies or lives in this 3D space. The same argument can be extended to all points in any n-dimensional real space, as it is done in the case of embeddings with 300d, 512d, 1024d vector etc.

However, in all nD array compute frameworks such as NumPy, PyTorch, TensorFlow etc, these are still 1D arrays because the length of the above said vectors can be represented using a single number.

But, what if you have more than 1 data point? Then, you have to stack them in some (unique) way. And this is where the need for a second dimension arises. So, let's say you stack 4 of these 512d vectors vertically, then you'd end up with a 2D array/tensor of shape (4, 512). Note that here we call the array as 2D because two integer numbers are required to represent the extent/length along each axis.

To understand this better, please refer my other answer on axis parameter visualization for nD arrays, the visual representation of which I will include it below.

^ref: Euclidean space wiki

score 2 · Answer 2 · answered Jun 15 '20 at 19:25

2

It is not overloading, but standard usage. What are the elements of a 512-dimensional vector space? They are 512 dimensional vectors. Each of which can be represented by 512 floating point number as in your equation. Each such vector spans a 1-dimensional subspace of the 512-dimensional space.

When you talk of the dimension of a tensor, a tensor is a linear map (roughly speaking, I am omitting the duals) from the product of N vector spaces to the reals. The dimension of a TENSOR is the N.

answered Jun 15 '20 at 19:25

Igor Rivin

4,632
2
23
35

I'm not sure I follow. A 512-d word-embedding for a word say 'cat' is a 1-demensional vector of length 512. isn't that different from a 512 dimension tensor? – Joe Black Jun 15 '20 at 19:34
A word embedding lives in a 512-dimensional vector space. a 512 dimensional tensor is a mapping from a product of 512 vector spaces to R. – Igor Rivin Jun 15 '20 at 19:38

stackoverflowuser2010 · Answer 3 · 2020-06-15T22:37:11.547

If you want to be more specific, you need to be clear on the terms dimension, rank, and shape.

The dimensionality of a tensor means the rank, which has a specific definition: the rank is the number of indices. When you see "3-dimensional tensor", you can take that to mean that the tensor has 3 indices, namely T[i][j][k]. So a vector has rank 1, a matrix has rank 2, a cube has rank 3, etc.

When you want to specify the size of each dimension, you should prefer to use the term shape. A 3-dimensional (aka rank 3) tensor can have shape [10, 20, 30] if the 0th dimension has 10 values, the 1st dimension has 20 values, and the 2nd dimension has 30 values. (This shape might represent, say, a batch of 10 images, each of shape 20x30.)

Note, though, that when talking about vectors, it is common to say "512-D vector". As you mentioned, this terminology comes up a lot with word embeddings (e.g. "we used 512-D word embeddings"). Since "vector" by definition means rank 1, then people will interpret that statement to mean "a structure of rank 1 with 512 values".

You might encounter someone saying "I have a 5-d vector", in which case you'd need to follow up with "wait, do you mean a 5-d tensor or a 1-d vector with 5 values?".

I am not a mathematician, by the way.

Why does 'dimension' mean several different things in the machine-learning world?

3 Answers3