Pandas: creating an indexed time series [starting from 100] from returns data

Question

I have data on logarithmic returns of a variable in a Pandas DataFrame. I would like to turn these returns into an indexed time series which starts from 100 (or any arbitrary number). This kind of operation is very common for example when creating an inflation index or when comparing two series of different magnitude:

So the first value in, say, Jan 1st 2000 is set to equal 100 and the next value in Jan 2nd 2000 equals 100 * exp(return_2000_01_02) and so on. Example below:

I know that I can loop through rows in a Pandas DataFrame using .iteritems() as presented in this SO question: iterating row by row through a pandas dataframe

I also know that I can turn the DataFrame into a numpy array, loop through the values in that array and turn the numpy array back to a Pandas DataFrame. The .as_matrix() method is explained here: http://pandas.pydata.org/pandas-docs/version/0.13.1/generated/pandas.Series.html

An even simpler way to do it is to iterate the rows by using the Python and numpy indexing operators [] as documented in Pandas indexing: http://pandas.pydata.org/pandas-docs/stable/indexing.html

The problem is that all these solutions (except for the iteritems) work "outside" Pandas and are, according to what I have read, inefficient.

Is there a way to create an indexed time series using purely Pandas? And if not, could you, please, suggest the most efficient way to do this. Finding solutions is surprisingly difficult, because index and indexing have a specific meaning in Pandas, which I am not after this time.

The solution to your problem with base 1 is `(1 + df.return).cumprod()`, where `df` is your dataframe — Sergey Bushmanov, Mar 22 '16 at 12:34

Colonel Beauvel · Accepted Answer · 2016-03-22T12:51:18.353

7

You can use a vectorized approach instead of a loop/iteration:

import pandas as pd
import numpy as np

df = pd.DataFrame({'return':np.array([np.nan, 0.01, -0.02, 0.05, 0.07, 0.01, -0.01])})

df['series'] = 100*np.exp(np.nan_to_num(df['return'].cumsum()))

#In [29]: df
#Out[29]:
#   return      series
#0     NaN  100.000000
#1    0.01  101.005017
#2   -0.02   99.004983
#3    0.05  104.081077
#4    0.07  111.627807
#5    0.01  112.749685
#6   -0.01  111.627807

edited Mar 22 '16 at 12:51

answered Mar 22 '16 at 12:34

Colonel Beauvel

30,423
11
47
87

Thanks! Works perfectly. I had to modify it a bit, because my data runs from newest to oldest. It was as simple as: df["series"] = 100 * np.exp(df[::-1].cumsum())[::-1]. Also, I didn't need the np.nan_to_num, because I had done a .dropna() earlier. – Crebit Mar 22 '16 at 13:37
great! the vectorized approach is the way to go - when possible - in pandas/numpy. – Colonel Beauvel Mar 22 '16 at 15:05
Do you know how to change your code to make #6 equal to 100 and work backwards? – Oniropolo Jan 11 '19 at 14:29

meinerst · Answer 2 · 2021-04-14T12:36:04.033

@Crebit

I have created a framework to index prices in pandas quickly!

See on my github below for the file: https://github.com/meinerst/JupyterWorkflow

It shows how you can pull the prices from yahoo finance and or show how you can work with your excisting dataframes.

I cant show the dataframe tables here. If you want to see them, follow the github link.

Indexing financial time series (pandas)

This example uses data pulled from yahoo finance. If you have a dataframe from elsewhere, go to part 2.

Part 1 (Pulling data)

For this, make sure the yfinance package is installed.

#pip install yfinance

import pandas as pd
import numpy as np
import yfinance as yf
import matplotlib.pyplot as plt
import datetime as dt

Insert the yahoo finance tickers into the variable 'tickers'. You can choose as many as you like.

tickers =['TSLA','AAPL','NFLX','MSFT']

Choose timeframe.

start=dt.datetime(2019,1,1)
end= dt.datetime.now()

In this example, the 'Adj Close' column is selected.

assets=yf.download(tickers,start,end)['Adj Close']

Part 2 (Indexing)

To graph a comparable price development graph the assets data frame needs to be indexed. New columns are added for this purpose. First the indexing row is determined. In this case the initial prices.

assets_indexrow=assets[:1]

New columns are added to the original dataframe with the indexed price developments.

Insert your desired indexing value below. In this case, it is 100.

for ticker in tickers:
    assets[ticker+'_indexed']=(assets[ticker]/ assets_indexrow[ticker][0])*100

The original columns of prices are then dropped

assets.drop(columns =tickers, inplace=True)

Graphing the result.

plt.figure(figsize=(14, 7))
for c in assets.columns.values:
    plt.plot(assets.index, assets[c], lw=3, alpha=0.8,label=c)
plt.legend(loc='upper left', fontsize=12)
plt.ylabel('Value Change')

I cant insert the graph due to limited reputation points but see here: Indexed Graph

Pandas: creating an indexed time series [starting from 100] from returns data

2 Answers2

Indexing financial time series (pandas)