Suppose the tensor
and tensor1
are some calculated transformations of an input with the shapes provided in the code snippet. The einsum operation performs Einstein's summation to aggregate the results in a specific order.
import tensorflow as tf
tf.random.set_seed(0)
tensor = tf.random.uniform(shape=(2, 2, 2)) # Shape: (n_nodes, n_nodes, n_heads)
tensor1 = tf.random.uniform(shape=(2, 2, 2)) # Shape: (n_nodes, n_heads, n_units)
print(tensor)
print("-" * 50)
print(tensor1)
print("-" * 50)
einsum_tensor = tf.einsum('ijh, jhu -> ihu', tensor, tensor1) # Shape: (n_nodes, n_heads, n_units)
print(einsum_tensor)
How can I modify the einsum operation if I add the batch dimension? What is the correct way to do the same operation if there were a batch dimension meaning the new shapes would have been:
tensor shape: (batch_size, n_nodes, n_nodes, n_heads)
tensor1 shape: (batch_size, n_nodes, n_heads, n_units)
output shape: (batch_size, n_nodes, n_heads, n_units)
I thought of the modification below, but I don't know if it's true. What I understood from the original operation is that j
and h
are dummy indexes, and i
and u
are free indexes.
einsum_tensor = tf.einsum('bijh, bjhu -> bihu', tensor, tensor1)
This guide is the reference I am using (line 228). Note that I have changed f
from the guide to u
.
P.S: I asked this question on Artificial Intelligence Stack but they suggested that this is a programming question and should be asked here.