1

Is there a way to create/generate a Pandas DataFrame from scratch, such that each record follows a specific mathematical function?

Background: In Financial Mathematics, very basic financial-derivatives (e.g. calls and puts) have closed-form pricing formulas (e.g. Black Scholes). These pricing formulas can be called stochastic functions (because they involve a random term)

I'm trying to create a Monte Carlo simulation of a stock price (and subseuqently an option payoff and price based on the stock price). I need, say, 1000 paths (rows) and 100 time-steps (columns). I want to "initiate" a dataframe that is 1000 by 100 and follows a stochastic equation.

# Psuedo-code
MonteCarloDF = DataFrame(rows=1000, columns=100, customFunc=TRUE,
        appliedBy='by column', 
        FUNC={s0=321; 
              s_i=prev*exp(r-q*sqrt(sigma))*T + 
                 (etc)*NormDist(rnd())*sqr(deltaT)}
        )

Column 0 in every row would be 321, and each subsequent column would be figured out based on the FUNC above.

This is an example of something similar done in VBA

Function MonteCarlo_Vanilla_call(S, K, r, q, vol, T, N)

sum = 0
payoff = 0

For i = 1 To N
 S_T = S * Exp((r - q - 0.5 * vol ^ 2) * T + vol * Sqr(T) * Application.NormSInv(Rnd()))
 payoff = Application.Max(S_T - K, 0)
 sum = sum + payoff
Next i

MonteCarlo_Vanilla_call = Exp(-r * T) * sum / N

End Function

Every passed in variable is a constant. In my case, I want each next column in the same row to be just like S_T in the VBA code. That's really the only like that matters. I want to apply a function like S_T = S * Exp((r - q - 0.5 * vol ^ 2) * T + vol * Sqr(T) * Application.NormSInv(Rnd())) . Each S_T is the next column in the same row. There's N columns making one simulation. I will have, for example, 1000 simulations.

321     | 322.125 | 323.277 | ... | column 100 value
321     | 320.704 | 319.839 | ... | column 100 value
321     | 321.471 | 318.456 | ... | column 100 value
...
row 1000| etc     | etc     | ... | value (1000,100)
VISQL
  • 1,960
  • 5
  • 29
  • 41
  • What have you tried and what _specific_ issues have you encountered? – sophros May 30 '19 at 09:34
  • The custom function require a previous value. I don't know what to __try__, or where to start with this. Can you give an example with a simpler function that uses the previous value in the same row? Is there another way to do this? – VISQL May 30 '19 at 10:06
  • Could you provide some context around some of the variables in the equation..? or do you have a link that might help? eg, what is `q`, `r`, `sigma`, `T` here? – Chris Adams May 30 '19 at 10:15
  • The variables will just contain constants defined beforehand. My specific equation isn't determine, but it will be similar to https://images.app.goo.gl/GNRYPawNQv7kwvo5A Black Scholes. – VISQL May 30 '19 at 10:30

2 Answers2

2

IIUC, you could create your own function to generate a DataFrame. Within the function iterate using .iloc[:, -1] to use the last created column.

We'll also use numpy.random.randn to generate an array of normally distributed random values.

You may need to adjust the default values of your variables, but the idea would be something like:

Function

import pandas as pd
import numpy as np
from math import exp, sqrt

def monte_carlo_df(nrows,
                   ncols,
                   col_1_val,
                   r=0.03,
                   q=0.5,
                   sigma=0.002,
                   T=1.0002,
                   deltaT=0.002):
    """Returns stochastic monte carlo DataFrame"""

    # Create first column
    df = pd.DataFrame({'s0': [col_1_val] * nrows})

    # Create subsequent columns
    for i in range(1, ncols):
        df[f's{i}'] = (df.iloc[:, -1] * exp(r - q * sqrt(sigma)) * T
                       + (np.random.randn(nrows) * sqrt(deltaT)))
    return df

Usage example

df = monte_carlo_df(nrows=1000, ncols=100, col_1_val=321)
Chris Adams
  • 18,389
  • 4
  • 22
  • 39
  • This one will take a while to sink in. Very interesting that you calculated the next step in each path, versus all steps in one path. So you're creating an entire column at one time by specifying `randn(nrows)`, correct? But each random number will still be from the standard normal distribution?? Floored. – VISQL May 30 '19 at 11:40
  • Yes, that's exactly right @VISQL `randn(nrows)` will return an `ndarray` the same shape as a column from the `DataFrame`. So they can be easily broadcast together in a mathematical expression – Chris Adams May 30 '19 at 11:43
0

To me your problem is a specific version of the following one: Pandas calculations based on other rows. Since you can pivot it shouldn't matter if we are talking rows or columns.

There is also a question relating to calculations using columns: Pandas complex calculation based on other columns which has a good suggestion of using a rolling window (rolling function) or using shift function: Calculate the percentage increase or decrease based on the previous column value of the same row in pandas dataframe

Speed considerations of similar calculations (or numpy vs pandas discussion): Numpy, Pandas: what is the fastest way to calculate dataset row value basing on previous N values?

To sum it all up - it seems that your question is somewhat of a duplicate.

sophros
  • 14,672
  • 11
  • 46
  • 75