I'm having trouble representing data for a project in a database-appropriate way. To help, I've set up a SQL Fiddle for the example at: http://sqlfiddle.com/#!2/6d01c2/14/0
At its simplest, I have three tables: Fund, FundReturn, and FactorReturn. Fund has many monthly returns via FundReturn. Fund also has a Region (like US, or Foreign) and these regions have what's called a FactorReturn. So the FactorReturn's relation to Fund isn't direct but rather through the Region the Fund belongs to.
**Fund**
fund_symbol | varchar(5) | PRI
region_key | varchar(255) |
**FundReturn**
fund_symbol | varchar(255) | PRI
return_month | int(10) | PRI
return_value | decimal(5,4) |
**FactorReturn**
region_key | varchar(255) | PRI
factor_key | varchar(255) | PRI
return_month | int(10) | PRI
return_value | decimal(5,4) |
The goal is to pull the fund's returns for a given period of time as well as the corresponding factor returns to perform some analysis. The final representation would look something like this:
fund_symbol | month | fund_return | factor_ret_1 | factor_ret_2 | factor_ret_3
VTI | 201001 | 0.0100 | 0.0200 | -0.0100 | 0.0000
VTI | 201002 | 0.0500 | 0.0300 | 0.0300 | 0.0010
VTI | 201003 | 0.0300 | 0.0100 | -0.0200 | 0.0020
Additionally, I'm using the PHP Doctrine ORM and have mapped the data to be able to use methods as follows:
$fund = $em->find('VTI');
$fund->getFundReturns(); // Would return array with FundReturn objects.
$fund->getFactorReturns(); // Would return array with FactorReturn objects.
Problems
Due to the nature of the data, trying to query for both fund returns as well as factor returns in one query exponentially increases the amount of rows returned. This is exacerbated once I try to use this in the ORM as it is impossible to build the object graph with a broad date range. Using multiple queries is my current solution, but this is tricky when using an ORM as the accessing the relations incorrectly from my code can trigger massive SQL queries.
The query at http://sqlfiddle.com/#!2/6d01c2/14/0 shows how 432 rows are returned when querying for just 12 months.
Questions
So my questions are:
- Is there a better way to represent this data in a database?
- What's the right way to query for data in SQL? In an ORM?