Let's take this as an elaborated comment. I think it mostly depends on your len_mul
function. If you want to do exactly the same as in your question you could use a little of linear algebra. In particular the fact that multipl a matrix nxq with a matrix qxm you obtain a matrix nxm.
import pandas as pd
import numpy as np
df = pd.DataFrame({"Data":["A", "Av", "Zcef"]})
# this is the len of every entries
v = df["Data"].str.len().values
# this reshape as a (3,1) matrix
v.reshape((-1,1))
# this reshape as a (1,3) matrix
v.reshape((1,-1))
#
arr = df["Data"].values
# this is the matrix multiplication
m = v.reshape((-1,1)).dot(v.reshape((1,-1)))
# your expected output
df_out = pd.DataFrame(m,
columns=arr,
index=arr)
Update
I agree that Scott Boston solution is good for the general case of a custom function. But I think you should look for a possible way to translate your function to something you could do using Linear Algebra.
Here some timing:
import pandas as pd
import numpy as np
import string
alph = list(string.ascii_letters)
n = 10000
data = ["".join(np.random.choice(alph,
np.random.randint(1,10)))
for i in range(n)]
data = sorted(list(set(data)))
df = pd.DataFrame({"Data":data})
def len_mul(a,b):
return len(a) * len(b)
Scott Boston 1st solution
%%time
idx = pd.MultiIndex.from_product([df['Data'], df['Data']])
df_out1 = pd.Series(idx.map(lambda x: len_mul(*x)), idx).unstack()
CPU times: user 1min 32s, sys: 10.3 s, total: 1min 43s
Wall time: 1min 43s
Scott Boston 2nd solution
%%time
lens = df['Data'].str.len()
arr = df['Data'].values
df_out2 = pd.DataFrame(np.outer(lens,lens),
index=arr,
columns=arr)
CPU times: user 99.7 ms, sys: 232 ms, total: 332 ms
Wall time: 331 ms
Vectorial solution
%%time
v = df["Data"].str.len().values
arr = df["Data"].values
m = v.reshape((-1,1)).dot(v.reshape((1,-1)))
df_out3 = pd.DataFrame(m,
columns=arr,
index=arr)
CPU times: user 477 ms, sys: 188 ms, total: 666 ms
Wall time: 666 ms
Conclusions:
The clear winner is Scott Boston 2nd solution with my 2x slower. The 1st solution is, respectively, 311x and 154x slower.