8

I'm looking for a way to output a selected related record for each record in a table in MySQL. I'll explain further...

I have 2 tables currencies and exchange_rates. The tables are joined by a currency_code field and each currency record has multiple related exchange rate records, each exchange rate record represents a different day. So there is a 1:many relationship between currencies and exchange_rates.

I want to retrieve a full record from the exchange_rates table for each currency but with the ability to define specific criteria as to which related record to select. Not just the most recent exchange_rate for each currency but maybe the most recent exchange_rates record for each currency that has the field criteria_x=NULL.

It's a shame that you can't use LIMIT within a derived table otherwise something like this would be a neat and readable solution...

SELECT `currencies`.`currency_code`, `currencies`.`country`, `exchange_rates`.`id`,
       FROM_UNIXTIME(`exchange_rates`.`datestamp`), `rate` 
FROM `currencies` 
INNER JOIN (
SELECT `id`, `currency_code`, `invoice_id`, `datestamp`, `rate` 
FROM `exchange_rates` 
WHERE `criteria_x`=NULL AND `criteria_y` LIKE 'A' 
ORDER BY `datestamp` DESC
LIMIT 0, 1
) AS `exchange_rates` ON `currencies`.`currency_code`=`exchange_rates`.`currency_code`
ORDER BY `currencies`.`country`

The LIMIT clause is applied to the parent query not the derived table.

This is the only way I've found to do this...

SELECT `currencies`.`currency_code`, `currencies`.`country`, 
FROM_UNIXTIME( SUBSTRING_INDEX( SUBSTRING_INDEX(`exchange_rates`.`concat`, '-', 1), '-', -1)) AS `datestamp`,
SUBSTRING_INDEX( SUBSTRING_INDEX(`exchange_rates`.`concat`, '-', 2), '-', -1) AS `id`, 
SUBSTRING_INDEX( SUBSTRING_INDEX(`exchange_rates`.`concat`, '-', 3), '-', -1) AS `rate` 
FROM `currencies`
INNER JOIN (
SELECT `currency_code`, MAX(CONCAT_WS('-', `datestamp`, `id`, `rate`)) AS `concat`
FROM `exchange_rates` 
WHERE `criteria_x`=NULL AND `criteria_y` LIKE 'A' 
GROUP BY `exchange_rates`.`currency_code`
) AS `exchange_rates` ON `currencies`.`currency_code`=`exchange_rates`.`currency_code`
ORDER BY `currencies`.`country`

So concatenating a bunch of fields together and running a MAX() on it to get my sort order within the group, then parsing those fields out in the parent query with SUBSTRING_INDEX(). The problem is that this method only works when I can use a MIN() or MAX() on the concatenated field. It wouldn't be ideal if I wanted to sort a string or sort by multiple criteria but limit to a single record.

Also it causes me physical pain to have to resort to horrible string manipulation to get the data I want from a relational database — there has to be a better way!

Anyone got any suggestions of a better method?

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
batfastad
  • 1,943
  • 3
  • 27
  • 37

4 Answers4

3

There are a few general issues to discuss (briefly) before trying to provide an answer.

Your first query is:

SELECT `currencies`.`currency_code`, `currencies`.`country`, `exchange_rates`.`id`,
       FROM_UNIXTIME(`exchange_rates`.`datestamp`), `rate` 
FROM `currencies` 
INNER JOIN (
SELECT `id`, `currency_code`, `invoice_id`, `datestamp`, `rate` 
FROM `exchange_rates` 
WHERE `criteria_x`=NULL AND `criteria_y` LIKE 'A' 
ORDER BY `datestamp` DESC
LIMIT 0, 1
) AS `exchange_rates` ON `currencies`.`currency_code`=`exchange_rates`.`currency_code`
ORDER BY `currencies`.`country`
  1. I don't think you need to use as many back-quotes as you are using. They aren't exactly wrong, but I'm not going to type them in my answer.
  2. The SQL standard does not sanction the criteria_x = NULL notation; that should be written as criteria_x IS NULL. MySQL may allow it; as long as you are aware that it is non-standard, it is OK for you to use.
  3. The criterion LIKE 'A' is not sensible if it contains no metacharacters (% or _ in standard SQL). You'd be better off with simple equality: = 'A'.

Your question says:

I want to retrieve a full record from the exchange_rates table for each currency but with the ability to define specific criteria as to which related record to select. Not just the most recent exchange rate for each currency, but maybe the most recent exchange rate for each currency that has the field criteria_x IS NULL.

So, you want to select the most recent exchange rate record for each currency that meets the required other criteria. We can assume that there is a unique constraint on the combination of currency_code and datestamp in the exchange rate table; this means that there will always be at most one matching row. You've not specified what should be shown if there is no matching row; an inner join will simply not list that currency, of course.

With SQL queries, I usually build and test the overall query in steps, adding extra material to the previously developed queries that are known to work and produce the right output. If it is simple and/or I've collected too much hubris, I'll try a complex query first, but when (nemesis) it doesn't work, then I go back to the build and test process. Think of it as Test Driven (Query) Development.

Stage 1: Exchange rate records that match specified criteria

SELECT id, currency_code, invoice_id, datestamp, rate 
  FROM exchange_rates 
 WHERE criteria_x IS NULL AND criteria_y = 'A' 
 ORDER BY currency_code, datestamp DESC

Stage 2: Most recent exchange rate time for each currency that matches specified criteria

SELECT currency_code, MAX(datestamp) 
  FROM exchange_rates 
 WHERE criteria_x IS NULL AND criteria_y = 'A' 
 GROUP BY currency_code

Stage 3: Exchange rate record for most recent exchange rate time for each currency that matches specified criteria

SELECT x.id, x.currency_code, x.invoice_id, x.datestamp, x.rate 
  FROM exchange_rates AS x
  JOIN (SELECT currency_code, MAX(datestamp) AS datestamp
          FROM exchange_rates 
         WHERE criteria_x IS NULL AND criteria_y = 'A' 
         GROUP BY currency_code
       ) AS m
    ON x.currency_code = m.currency_code AND x.datestamp = m.datestamp

Stage 4: Currency information and exchange rate record for most recent exchange rate time for each currency that matches specified criteria

This requires the joining the currencies table with the output of the previous query:

SELECT c.currency_code, c.country, r.id,
       FROM_UNIXTIME(r.datestamp), r.rate
  FROM currencies AS c 
  JOIN (SELECT x.id, x.currency_code, x.invoice_id, x.datestamp, x.rate 
          FROM exchange_rates AS x
          JOIN (SELECT currency_code, MAX(datestamp) AS datestamp
                  FROM exchange_rates 
                 WHERE criteria_x IS NULL AND criteria_y = 'A' 
                 GROUP BY currency_code
               ) AS m
            ON x.currency_code = m.currency_code AND x.datestamp = m.datestamp
       ) AS r
    ON c.currency_code = r.currency_code
 ORDER BY c.country

Except that Oracle only allows ') r' instead of ') AS r' for table aliases and the use of FROM_UNIXTIME(), I believe that should work correctly with the current version of almost any SQL DBMS you care to mention.

Since the invoice ID is not returned in the final query, we can remove that from the select-list of the middle query. A good optimizer might do that automatically.

If you want to see the currency information even if there is no exchange rate that matches the criteria, then you need to change the JOIN in the outermost query to a LEFT JOIN (aka LEFT OUTER JOIN). If you only want to see a subset of the currencies, you can apply that filter at either the last (outermost) query stage, or (if the filter is based on information available in the exchange rate table, such as the currency code) at either the innermost sub-query (most efficient) or the middle sub-query (not so efficient unless the optimizer realizes it can push the filter down to the innermost sub-query).

Correctness is usually the primary criterion; performance is a secondary criterion. However, performance was mentioned in the question. The first rule is to measure the 'simple' query shown here. Only if that proves too slow do you need to worry further. When you do need to worry, you examine the query plan to see if there is, for example, a crucial index missing. Only if the query still isn't fast enough do you start trying to resort to other tricks. Those tricks tend to be very specific to a particular DBMS. For example, there might be optimizer hints that you can use to make the DBMS process the query differently.

Community
  • 1
  • 1
Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
  • 1
    A fantastic answer, works perfectly and really well explained. I knew there had to be a decent way without concatenating/exploding a string of data. Regarding the backticks, yeah it's a MySQL habit and a bad habit. I could omit most of them but you just never know what will happen with an upgrade at a site that will break the query if a new keyword is introduced. Also the IS NULL/=NULL, yep I realise that... I just forgot to change that when pasting my demo query in the answer. THANKS!!! – batfastad May 15 '12 at 13:23
  • Thanks. For discussion of back-ticks etc, see also [What does the SQL Standard say about the usage of `?](http://stackoverflow.com/questions/10573922/what-does-the-sql-standard-say-about-usage-of/10574031#10574031). – Jonathan Leffler May 15 '12 at 15:58
2

If I've understood your problem correctly, all you need to do is self-join exchange_rates to select the rate of interest:

SELECT   currencies.currency_code,
         currencies.country,
         exchange_rates.id,
         FROM_UNIXTIME(exchange_rates.datestamp),
         exchange_rates.rate
FROM     currencies
  JOIN   (
    SELECT   currency_code, MAX(datestamp) AS datestamp
    FROM     exchange_rates
    WHERE    criteria_x IS NULL AND criteria_y LIKE 'A'
    GROUP BY currency_code
  )   AS exchange_wantd USING (currency_code)
  JOIN   exchange_rates USING (currency_code, datestamp)
ORDER BY currencies.country
eggyal
  • 122,705
  • 18
  • 212
  • 237
1

Try this query. It is expected to work fine but if you provide some data i will be able to do it properly

SELECT  `currencies`.`currency_code` as `CurrencyCode`,
    `currencies`.`country`, 
    FROM_UNIXTIME( SUBSTRING_INDEX( SUBSTRING_INDEX(`exchange_rates`.`concat`, '-', 1), '-', -1)) AS `datestamp`,
    SUBSTRING_INDEX( SUBSTRING_INDEX(`exchange_rates`.`concat`, '-', 2), '-', -1) AS `id`, 
    SUBSTRING_INDEX( SUBSTRING_INDEX(`exchange_rates`.`concat`, '-', 3), '-', -1) AS `rate`,
    (SELECT 
            MAX(CONCAT_WS('-', `datestamp`, `id`, `rate`)) AS `concat` 
            FROM `exchange_rates` 
            WHERE `criteria_x`= NULL 
            AND `criteria_y` LIKE 'A' 
            GROUP BY `exchange_rates`.`currency_code`
            HAVING `exchange_rates`.`currency_code` =`CurrencyCode`
    ) as `Concat`
FROM    `currencies`
ORDER BY `currencies`.`country` 
Muhammad Raheel
  • 19,823
  • 7
  • 67
  • 103
  • Thanks but that looks similar to what I've got above, just with a sub-select instead of the inner join on derived table. My 2nd one above works fine but I'm trying to find a way without concatenating fields then extrapolating a string. – batfastad May 04 '12 at 19:04
  • you might not have noticed here i have selected CurrencyCode in the outer query and then used it in the sub query in having clause to restrict search – Muhammad Raheel May 10 '12 at 14:03
0

If I understand you correctly the answer is quite simple if you don't force the database to think like a human. Like Jonathan Leffler I understand your intention to "select the most recent exchange rate record for each currency that meets the required other criteria".

"The most recent" is of course the problem, as that information is not explicitly stored in the DB, so we determine this information first.

SELECT currency_code, MAX(datestamp) AS datestamp FROM exchange_rates GROUP BY currency_code

We are going to reuse this so we give the result a name

(SELECT currency_code, MAX(datestamp) AS datestamp FROM exchange_rates GROUP BY currency_code) AS dates_we_want

The domain holding all the information we could possibly want is a record for each possible combination of dates_we_want, the currencies and exchange_rates tables

(SELECT currency_code, MAX(datestamp) AS datestamp FROM exchange_rates GROUP BY currency_code) AS dates_we_want, currencies AS c, exchange_rates AS er

Selection of records we want:

  • matching currency_codes

    dates_we_want.currency_code=er.currency_core AND dates_we_want.currency_code=c.currency_core

  • most recent rates

    dates_we_want.datestamp=er.datestamp

Projection into the result. You

want to retrieve a full record from the exchange_rates table

simply translates to

er.*

Put everything together in a SELECT-statement and give you the change to put your arbitrary constraints:

SELECT er.*
FROM
    (SELECT currency_code, MAX(datestamp) AS datestamp
       FROM exchange_rates GROUP BY currency_code
    ) AS dates_we_want,
    currencies AS c, exchange_rates AS er
WHERE
    dates_we_want.currency_code=er.currency_core
AND
    dates_we_want.currency_code=c.currency_core
AND
    dates_we_want.datestamp=er.datestamp
AND
    `criteria_x`=NULL AND `criteria_y` LIKE 'A' 
Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
user1129682
  • 989
  • 8
  • 27
  • You have to push down the `criteria_x` and `criteria_y` conditions into the sub-query. What you're asking is a subtly different question. You are asking for the rows where the most recent row happens to meet the sub-criteria, rather than the most recent row that does meet the sub-criteria. I recommend avoiding the horizontal scroll-bar when possible. Personally, I don't think the keyword AND needs to be on a line on its own. I also strongly recommend using the explicit JOIN notation instead of the comma separated list of table expressions with joins in the WHERE clause. – Jonathan Leffler May 15 '12 at 15:52
  • I think you are arguing to solve the wrong problem. We do not want to relay the question word by word to the DBMS, we want to "retrieve a full record from the exchange_rates table for each currency [satisfying a set of constraints]". And my query does precisely that; on purpose, not accidently or by chance. That's the point of my very first sentece: "don't force the database to think like a human". Because it won't and DBMS are so good at combining, slicing, selecting and interpreting queries before execution, that it virtually doesn't pay off to think about intermediary results. – user1129682 May 15 '12 at 19:09