0

all,

I'm currently computing a series of square pandas DataFrame objects as part of a bootstrapping algorithm and although I can compute it correctly, computing it efficiently has thus far eluded me.

Currently the dataframes are computed as follows, using a function func, which varies according to the nature of the data:

frame = pandas.DataFrame(0, index=idx, columns=idx)
for row in idx:
    for col in idx:
        frame.loc[row, col] = func(row, col)

Unfortunately, the square matrices that are built can wind up being quite large (up to 10k cells), so the above can run quite slowly. Is there any way to perform this construction more efficiently than this nested loop method using pandas and/or numpy?

JWWalthers
  • 35
  • 5
  • Duplicate of https://stackoverflow.com/questions/39475978/apply-function-to-each-cell-in-dataframe/39476023 – Eric Truett Apr 11 '20 at 13:32
  • @Eric Truett, no, this is not the same question, as OP wants to apply a function to the index values, not the cell values themselves. – Arne Apr 11 '20 at 13:35
  • It should be much faster if you compute it in NumPy as an `ndarray`, without using pandas. You could still convert the result to a DataFrame in the end. – Arne Apr 11 '20 at 13:38

1 Answers1

0

It should be faster in NumPy, and you can use nested list comprehensions instead of the explicit loops:

import numpy as np

# example function
def func(i, j):
    return 10 * i + j

# example index
idx = [0, 1, 2, 3]

frame = np.array([[func(row, col) for col in idx] for row in idx])       
frame
array([[ 0,  1,  2,  3],
       [10, 11, 12, 13],
       [20, 21, 22, 23],
       [30, 31, 32, 33]])

You can of course convert the result to a DataFrame if necessary:

import pandas as pd
frame = pd.DataFrame(frame)
Arne
  • 9,990
  • 2
  • 18
  • 28