1

I have column like this shown below:

  Data
0    A
1   Av
2  Zcef

I want desire output with using some function like

def len_mul(a,b):
    return len(a) * len(b)

This function can be replace,

Data  A  Av  Zcef            
A     1   2    4
Av    2   4    8
Zcef   4  8    16

I am able to do this using for loop, But I don't want to use for loop.
I am trying using pd.crosstab, but I am stuck at aggfunc.

len_mul function is important as this is example function for simplicity.

Blaztix
  • 1,223
  • 1
  • 19
  • 28
Akhilesh_IN
  • 1,217
  • 1
  • 13
  • 19

3 Answers3

3

Using your custom function:

def len_mul(a,b):
    return len(a) * len(b)

idx = pd.MultiIndex.from_product([df['Data'], df['Data']])
df_out = pd.Series(idx.map(lambda x: len_mul(*x)), idx).unstack()
df_out

Output:

     A  Av  Zcef
A     1   2     4
Av    2   4     8
Zcef  4   8    16

This was from @piRSquared SO Post


You can use np.outer with pd.DataFrame constructor:

lens = df['Data'].str.len()
pd.DataFrame(np.outer(lens,lens), index = df['Data'], columns=df['Data'])

Output:

Data  A  Av  Zcef
Data             
A     1   2     4
Av    2   4     8
Zcef  4   8    16
Scott Boston
  • 147,308
  • 15
  • 139
  • 187
0

Let's take this as an elaborated comment. I think it mostly depends on your len_mul function. If you want to do exactly the same as in your question you could use a little of linear algebra. In particular the fact that multipl a matrix nxq with a matrix qxm you obtain a matrix nxm.

import pandas as pd
import numpy as np

df = pd.DataFrame({"Data":["A", "Av", "Zcef"]})
# this is the len of every entries
v = df["Data"].str.len().values
# this reshape as a (3,1) matrix
v.reshape((-1,1))
# this reshape as a (1,3) matrix
v.reshape((1,-1))
#
arr = df["Data"].values
# this is the matrix multiplication
m = v.reshape((-1,1)).dot(v.reshape((1,-1)))
# your expected output
df_out = pd.DataFrame(m,
                   columns=arr,
                   index=arr)

Update

I agree that Scott Boston solution is good for the general case of a custom function. But I think you should look for a possible way to translate your function to something you could do using Linear Algebra.

Here some timing:

import pandas as pd
import numpy as np
import string
alph = list(string.ascii_letters)

n = 10000
data = ["".join(np.random.choice(alph,
                                 np.random.randint(1,10))) 
        for i in range(n)]
data = sorted(list(set(data)))
df = pd.DataFrame({"Data":data})


def len_mul(a,b):
    return len(a) * len(b)

Scott Boston 1st solution

%%time
idx = pd.MultiIndex.from_product([df['Data'], df['Data']])
df_out1 = pd.Series(idx.map(lambda x: len_mul(*x)), idx).unstack()

CPU times: user 1min 32s, sys: 10.3 s, total: 1min 43s
Wall time: 1min 43s

Scott Boston 2nd solution

%%time
lens = df['Data'].str.len()
arr = df['Data'].values
df_out2 = pd.DataFrame(np.outer(lens,lens),
                       index=arr,
                       columns=arr)

CPU times: user 99.7 ms, sys: 232 ms, total: 332 ms
Wall time: 331 ms

Vectorial solution

%%time
v = df["Data"].str.len().values
arr = df["Data"].values
m = v.reshape((-1,1)).dot(v.reshape((1,-1)))
df_out3 = pd.DataFrame(m,
                       columns=arr,
                       index=arr)

CPU times: user 477 ms, sys: 188 ms, total: 666 ms
Wall time: 666 ms

Conclusions:

The clear winner is Scott Boston 2nd solution with my 2x slower. The 1st solution is, respectively, 311x and 154x slower.

rpanai
  • 12,515
  • 2
  • 42
  • 64
-1

My suggestion would be building the array with list comprehension instead of a loop.
That way, you can easily create a dataframe with it afterwards.

Example usage:

import pandas as pd

array = ['A','B','C']

def function (X):

    return X**2

L = [[function(X) for X in pd.np.arange(3)] for Y in pd.np.arange(3)]
L
>>> [[0, 1, 4], [0, 1, 4], [0, 1, 4]]

pd.DataFrame(L, columns=array, index=array)

enter image description here

some text on it: https://www.pythonforbeginners.com/basics/list-comprehensions-in-python

hope it helps!

epattaro
  • 2,330
  • 1
  • 16
  • 29