I will try to explain my problem as best I can. I'm having performance issues, and I wonder if there's a better way of setting this up.
I have a database with roughly 150 years worth of year-by-year data. Each row has about 10 columns.
I am running a "simulation" on every 30yr span (we'll call each 30yr chunk a "Cycle"). So, Cycle 1 would be years 1-31. Cycle 2 would be years 2-32. Cycle 3, years 3-33. Get the idea?
all_data = DataPoint.objects.all().yearly()
cycle_length = 30
previous_data_points = []
for data_point in all_data:
if len(previous_data_points) < cycle_length:
previous_data_points.append(data_point)
continue
simulation_cycles.append(Cycle(previous_data_points))
if len(previous_data_points) == cycle_length:
previous_data_points.pop(0)
previous_data_points.append(data_point)
So, for each 30 year Cycle, I feed the Cycle
function the 30 queryset items to initialize data. Problem is, when I use Django's connection.queries to list out what's going on, it looks like it's doing 3000+ queries and is taking 10-12 seconds, which is pretty long for what it's doing.
In the connection.queries list, I see that it's doing 30 separate calls (1 for each data point) when I pass it into Cycle (each one using "WHERE EXTRACT(MONTH FROM" which I believe is my .yearly() filter being called. But then it also is logging a query inside of Cycle() where it actually looks up the data point by date.
When I pass along parts of querysets like that, is the data being cached? Is my app really running 3000 queries (using connections on the db) or is it 1 big query in my DataPoint.objects.all().yearly() call, and everything else is in memory?
I'm trying to understand why this is running so slowly. Part of the problem, I assume, is that I'm creating this huge list of objects: 120 "Cycles", each with 30 individual "Year" objects that have data of their own (and calculations to make that data) for later. Is having that many objects in memory going to hurt me, or is that small potatoes?
Edit:
class Cycle:
def __init__(self, data_points):
self.range_start = data_points[0].data_date
self.range_end = data_points[-1].data_date
self.start_CPI = data_points[0].cpi
self.years = relativedelta(self.range_end, self.range_start).years + 1
self.sim = []
for i in range (0, self.years):
data_point = data_points[i]
self.sim.append(Segment(
date=self.range_start + relativedelta(years=i),
start_CPI=self.start_CPI,
yearly_equities_growth=data_point.yearly_equities_growth,
cpi=data_point.cpi,
dividend=data_point.dividend,
s_and_p_composite=data_point.s_and_p_composite,
long_interest_rate=data_point.long_interest_rate
))
class Segment:
def __init__(self, date, start_CPI, yearly_equities_growth, cpi, dividend, s_and_p_composite, long_interest_rate):
self.start_CPI = D(start_CPI)
self.date = date
self.portfolio = {
"start": None,
"end": None,
"inflation_adjusted_start": None,
"inflation_adjusted_end": None,
"fees": None
}
self.spending = None
self.inflation_adjusted_spending = None
self.equities = {
"start": None,
"growth": None,
"val": None
}
self.bonds = {
"start": None,
"growth": None,
"val": None
}
self.gold = {
"start": None,
"growth": None,
"val": None
}
self.cash = {
"start": None,
"growth": None,
"val": None
}
self.dividends = {
"growth": None,
"val": None
}
self.fees = None
self.yearly_equities_growth = D(yearly_equities_growth)
self.cumulative_inflation = 1 + (D(cpi) - self.start_CPI) / self.start_CPI
self.sum_of_adjustments = None
self.cpi = cpi
self.dividend = dividend
self.s_and_p_composite = s_and_p_composite
self.long_interest_rate = long_interest_rate