I'm trying to find the annual percent change for each ticker in fundamentals_df
, by looking at pricing data in another dataframe (prices_df
). However, the dataset is quite large, and a full loop can take over 3 hours. I've been looking into vectorization as an alternative, but I'm quite new, and am looking for advice on how I could apply it to my particular problem. Lot's of solutions out there are for single dataframe, and not cases where you have to use data from one dataframe to access another.
Here's an outline of the steps I'm taking right now:
- I am looping though each row in
fundamentals_df
, and getting theticker
, andyear
attributes. - Then going into
prices_df
and finding all rows with the sameticker
- Then further filtering by getting the the rows between a given year, and the next year (year + 1).
- Finally, getting the first and last values, and calculating the percent change.
The Dataset
prices_df:
ticker object
date datetime64[ns]
close float64
fundamentals_df has a handful of fields, but these are the two most relevant:
ticker object
year object
The ticker
attribute is unique, and is shared by both dataframes.
The code
Here's the loop that goes through fundamentals_df
, getting each ticker, and year. Then using the ticker to perform an initial filter on prices_df
, just getting the rows relevant to the current ticker.
current_ticker = ''
returns_dict = {}
for index, row in fundamentals_df.iterrows():
if row.ticker != current_ticker:
# This first "if" statement determines if we're working on a new ticker.
# If we are, then get the new ticker's price history, and pretty it up.
current_ticker = row.ticker
# Filter prices_df by ticker
temp_price_df = prices_df.query(f'ticker == "{current_ticker}"')
# Make sure it's sorted by date. SUPER IMPORTANT since we calculate pct_change using first and last rows.
temp_price_df = temp_price_df.sort_values(by='date')
else:
# If we're working on the same ticker, then no need to revisit prices_df
pass
annual_return = calc_annual_return(temp_price_df, row.year)
returns_dict[index] = annual_return
Here's the function to calculate annual return. It is responsible for taking a temporary dataframe with all the prices for a given ticker, and filter out all the dates that aren't relevant. then use the first and last values remaining to calculate percent change.
def calc_annual_return(prices, year):
# "year" represents the end of the period;
# we want the stocks performance for the following year, so we add 1
year = int(year) + 1
# Just get the slice of time we meed
prices = prices.loc[prices['date'].between(f'{year}-01-01', f'{year + 1}-01-01')]
try:
start_price = prices.iloc[0].closeadj
end_price = prices.iloc[-1].closeadj
pct_change = (end_price - start_price) / start_price
except:
return 'n/a'
return pct_change
I know this is quite a long post, I hope it's thorough, and clear. Any help would be greatly appreciated.