39

I am trying to fetch the first and the last record of a 'grouped' record.
More precisely, I am doing a query like this

SELECT MIN(low_price), MAX(high_price), open, close
FROM symbols
WHERE date BETWEEN(.. ..)
GROUP BY YEARWEEK(date)

but I'd like to get the first and the last record of the group. It could by done by doing tons of requests but I have a quite large table.

Is there a (low processing time if possible) way to do this with MySQL?

Nikita R.
  • 7,245
  • 3
  • 51
  • 62
Jimmy
  • 928
  • 2
  • 9
  • 17

4 Answers4

65

You want to use GROUP_CONCAT and SUBSTRING_INDEX:

SUBSTRING_INDEX( GROUP_CONCAT(CAST(open AS CHAR) ORDER BY datetime), ',', 1 ) AS open
SUBSTRING_INDEX( GROUP_CONCAT(CAST(close AS CHAR) ORDER BY datetime DESC), ',', 1 ) AS close 

This avoids expensive sub queries and I find it generally more efficient for this particular problem.

Check out the manual pages for both functions to understand their arguments, or visit this article which includes an example of how to do timeframe conversion in MySQL for more explanations.

matpie
  • 17,033
  • 9
  • 61
  • 82
Joao Costa
  • 2,563
  • 1
  • 21
  • 15
  • 3
    Thanks for the crafty solution ! Still, I find it unfortunate that MySQL doesn't support FIRST() and LAST(), which would be much faster than this... – Aweb Dec 02 '11 at 14:57
  • Excellent solution. I wondered about performance and memory considerations on large tables until I saw that the operation is confined to the size defined by `group_concat_max_len` (default 1024). Good times! – Mike S. Oct 26 '12 at 18:04
  • The performance of all subqueries is not the same. It is so obvious it is embarrassing to have to say it, but it is heavily dependant on the subquery and the query it is imbedded in. And un-correllated subqueries, (where the execution of the subquery is not dependant on each row of the outer query) is no worse (or better) than it would be when run on its own. As the subquery in my solution below is... – Charles Bretana Jan 21 '13 at 03:57
  • Best solution for my problem and I looked a lot! Thanks! Avoids nasty subqueries or self-joins. – Apollo Data Oct 10 '13 at 19:43
  • could you write the full query? Thanks – giò Mar 20 '17 at 14:45
  • See the linked article for a full example. – Joao Costa Mar 20 '17 at 14:50
  • 1
    The article is down. – Sébastien De Spiegeleer Sep 09 '21 at 16:03
2

Try This to start with... :

Select YearWeek, Date, Min(Low_Price), Max(High_Price)
From
   (Select YEARWEEK(date) YearWeek, Date, LowPrice, High_Price
    From Symbols S
    Where Date BETWEEN(.. ..)
    GROUP BY YEARWEEK(date)) Z
Group By YearWeek, Date
Charles Bretana
  • 143,358
  • 22
  • 150
  • 216
0

Here is a great specific solution to this specific problem: http://topwebguy.com/first-and-last-in-mysql-a-working-solution/ It's almost as simple as using FIRST and LAST in MySQL.

I will include the code that actually provides the solution but you can look upi the whole text:

SELECT
word ,  

(SELECT a.ip_addr FROM article a
WHERE a.word = article.word
ORDER BY a.updated  LIMIT 1) AS first_ip,

(SELECT a.ip_addr FROM article a
WHERE a.word = article.word
ORDER BY a.updated DESC LIMIT 1) AS last_ip

FROM notfound GROUP BY word;
EdChum
  • 376,765
  • 198
  • 813
  • 562
-1

Assuming that you want the ids of the records with the lowest low_price and the highest high_price you could add these two columns to your query,

SELECT 

(SELECT id ORDER BY low_price ASC LIMIT 1) low_price_id,
(SELECT id ORDER BY high_price DESC LIMIT 1) high_price_id,

MIN(low_price), MAX(high_price), open, close
FROM symbols
WHERE date BETWEEN(.. ..)
GROUP BY YEARWEEK(date)

If efficiency is an issue you should add a column for 'year_week', add some covering indexes, and split the query in two.

The 'year_week' column is just an INT set to the value of YEARWEEK(date) and updated whenever the 'date' column is updated. This way you don't have to recalculate it for each query and you can index it.

The new covering indexes should look like this. The ordering is important. KEY yw_lp_id (year_week, low_price, id), KEY yw_hp_id (year_week, high_price, id)

You should then use these two queries

SELECT 
(SELECT id ORDER BY low_price ASC LIMIT 1) low_price_id,
MIN(low_price), open, close
FROM symbols
WHERE year_week BETWEEN(.. ..)
GROUP BY year_week

and

SELECT 
(SELECT id ORDER BY high_price DESC LIMIT 1) high_price_id,
MAX(high_price), open, close
FROM symbols
WHERE year_week BETWEEN(.. ..)
GROUP BY year_week

Covering indexes are pretty useful. Check this out for more details.

james.c.funk
  • 465
  • 4
  • 8