0

I have a function which takes two inputs and returns a float e.g. my_func('A', 'B') = 0.5.

I have a list of possible inputs: x = ['A', 'B', 'C', 'D', 'E', 'F'].

I want to produce a square matrix (in this case 6 by 6) where the values of each cell is the result of the function with the corresponding row and column values as inputs. Note my_func('A', 'A') = 1 and my_func('A', 'B') = my_func('B', 'A')

I have tried pandas.crosstab(x, x, aggfunc = my_func) but this doesn't work.

Imran
  • 608
  • 10
  • 17
  • 2
    Information like `this doesn't work` is useless. Please provide a small sample data set, definition of your function and desired data set. [How to make good reproducible pandas examples](http://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) – MaxU - stand with Ukraine Aug 01 '17 at 14:55

2 Answers2

3

Option 1
brute force

xt = pd.DataFrame(index=x, columns=x)

for i in x:
    for j in x:
        xt.set_value(i, j, my_func(i, j))

Demo

def my_func(i, j):
    return ord(i) * ord(j)

x = ['A', 'B', 'C', 'D', 'E', 'F']

xt = pd.DataFrame(index=x, columns=x)

for i in x:
    for j in x:
        xt.set_value(i, j, my_func(i, j))

xt

      A     B     C     D     E     F
A  4225  4290  4355  4420  4485  4550
B  4290  4356  4422  4488  4554  4620
C  4355  4422  4489  4556  4623  4690
D  4420  4488  4556  4624  4692  4760
E  4485  4554  4623  4692  4761  4830
F  4550  4620  4690  4760  4830  4900

Option 2

idx = pd.MultiIndex.from_product([x, x])
pd.Series(idx.map(lambda x: my_func(*x)), idx).unstack()

      A     B     C     D     E     F
A  4225  4290  4355  4420  4485  4550
B  4290  4356  4422  4488  4554  4620
C  4355  4422  4489  4556  4623  4690
D  4420  4488  4556  4624  4692  4760
E  4485  4554  4623  4692  4761  4830
F  4550  4620  4690  4760  4830  4900
piRSquared
  • 285,575
  • 57
  • 475
  • 624
  • I was hoping for something other than brute-force. My actual data is 400x400, so having run option 1 for a couple of minutes, my program estimates that it'll take about 10 hours to complete. – Imran Aug 01 '17 at 15:04
  • 2
    You may have been hoping, but without knowing what your function is, we have no choice but to blindly apply it to every combination. – piRSquared Aug 01 '17 at 15:06
  • What else would you like to know about the function? Apart from that it takes 2 inputs and outputs a number. – Imran Aug 01 '17 at 15:28
  • @Imran with this information the best you can hope for is to evaluate the function ~80k times (since arguments commute). You probably won't do much better than the above approach, especially if your function evaluation is expensive. – hilberts_drinking_problem Aug 01 '17 at 15:34
  • @Imran For example, suppose your function were to take `A` and `B` then grab the columns from some dataframe and take the dot product. By not telling us the details of the function, we are forced to call the function 80k times as Yakym pointed out. However, by telling us the details, we can turn that around and let you know that all you have to do is to perform a matrix multiplication and be done. The difference in performance would be tremendous. But we won't know what cleverness can be applied because you didn't think it was important to share the details. – piRSquared Aug 01 '17 at 17:06
0

considering the brute force method, it is recommended to use the .loc method, i.e.

for i in x:
    for j in x:
        xt.loc[i,j] = my_func(i,j)

instead of:

for i in x:
    for j in x:
        xt.set_value(i, j, my_func(i, j))

because set_value is deprecated and will be removed in a future release. Also, .at[] or .iat[] acessors may be used.

Max
  • 36
  • 4