Most efficient way to compute a square dataframe in pandas

Question

all,

I'm currently computing a series of square pandas DataFrame objects as part of a bootstrapping algorithm and although I can compute it correctly, computing it efficiently has thus far eluded me.

Currently the dataframes are computed as follows, using a function func, which varies according to the nature of the data:

frame = pandas.DataFrame(0, index=idx, columns=idx)
for row in idx:
    for col in idx:
        frame.loc[row, col] = func(row, col)

Unfortunately, the square matrices that are built can wind up being quite large (up to 10k cells), so the above can run quite slowly. Is there any way to perform this construction more efficiently than this nested loop method using pandas and/or numpy?

Duplicate of https://stackoverflow.com/questions/39475978/apply-function-to-each-cell-in-dataframe/39476023 — Eric Truett, Apr 11 '20 at 13:32
@Eric Truett, no, this is not the same question, as OP wants to apply a function to the index values, not the cell values themselves. — Arne, Apr 11 '20 at 13:35
It should be much faster if you compute it in NumPy as an `ndarray`, without using pandas. You could still convert the result to a DataFrame in the end. — Arne, Apr 11 '20 at 13:38

score 0 · Accepted Answer · answered Apr 11 '20 at 14:07

It should be faster in NumPy, and you can use nested list comprehensions instead of the explicit loops:

import numpy as np

# example function
def func(i, j):
    return 10 * i + j

# example index
idx = [0, 1, 2, 3]

frame = np.array([[func(row, col) for col in idx] for row in idx])       
frame

array([[ 0,  1,  2,  3],
       [10, 11, 12, 13],
       [20, 21, 22, 23],
       [30, 31, 32, 33]])

You can of course convert the result to a DataFrame if necessary:

import pandas as pd
frame = pd.DataFrame(frame)

Exactly what I needed, and considerably faster than the nested loop, thanks — JWWalthers, Apr 11 '20 at 16:06

Most efficient way to compute a square dataframe in pandas

1 Answers1