1

I am trying to return the price of the most recent record grouped by ItemNum and FeeSched, Customer can be eliminated. I am having trouble understanding how I can do that reasonably.

The issue is that I am joining about 5 tables containing hundreds of thousands of rows to end up with this result set. The initial query takes about a minute to run, and there has been some trouble with timeout errors in the past. Since this will run on a client's workstation, it may run even slower, and I have no access to modify server settings to increase memory / timeouts.

Here is my data:

Customer        Price            ItemNum           FeeSched             Date
   5            70.75             01202               12             12-06-2017
   5            70.80             01202               12             06-07-2016
   5            70.80             01202               12             07-21-2017
   5            70.80             01202               12             10-26-2016
   5            82.63             02144               61             12-06-2017
   5            84.46             02144               61             06-07-2016
   5            84.46             02144               61             07-21-2017
   5            84.46             02144               61             10-26-2016

I don't have access to create temporary tables, or views and there is no such thing as a @variable in C-tree, but in most ways it acts like MySql. I wanted to use something like GROUP BY ItemNum, FeeSched and select MAX(Date). The issue is that unless I put Price into the GROUP BY I get an error.

I could run the query again only selecting ItemNum, FeeSched, Date and then doing an INNER JOIN, but with the query taking a minute to run each time, it seems there is a better way that maybe I don't know.

Here is my query I am running, it isn't really that complicated of a query other than the amount of data it is processing. Final results are about 50,000 rows. I can't share much about the database structure as it is covered under an NDA.

SELECT DISTINCT 
CustomerNum,
paid as Price,
ItemNum,
n.pdate as newest
from admin.fullproclog as f
    INNER JOIN ( 
        SELECT 
               id,
               itemId,
               MAX(TO_CHAR(pdate, 'MM-DD-YYYY')) as pdate
        from admin.fullproclog 
        WHERE pdate > timestampadd(sql_tsi_year, -3, NOW())
        group by id, itemId
    ) as n ON n.id = f.id AND n.itemId = f.itemId AND n.pdate = f.pdate

    LEFT join (SELECT itemId AS linkid, ItemNum FROM   admin.itemlist) AS codes ON codes.linkid = f.itemId AND ItemNum >0
    INNER join (SELECT DISTINCT parent_id, 
                   MAX(ins1.feesched) as CustomerNum
    FROM   admin.customers AS p 
          left join admin.feeschedule AS ins1 
                 ON ins1.feescheduleid = p.primfeescheduleid 
          left join admin.group AS c1 
                 ON c1.insid = ins1.feesched 
    WHERE status =1
                 GROUP BY parent_id) 
      AS ip ON ip.parent_id = f.parent_id 

WHERE CustomerNum >0 AND ItemNum >0
UNION ALL

SELECT DISTINCT 
CustomerNum,
secpaid as Price,
ItemNum,
n.pdate as newest
from admin.fullproclog as f
    INNER JOIN ( 
        SELECT 
               id,
               itemId,
               MAX(TO_CHAR(pdate, 'MM-DD-YYYY')) as pdate
        from admin.fullproclog 
        WHERE pdate > timestampadd(sql_tsi_year, -3, NOW())
        group by id, itemId
    ) as n ON n.id = f.id AND n.itemId = f.itemId AND n.pdate = f.pdate

    LEFT join (SELECT itemId AS linkid, ItemNum FROM   admin.itemlist) AS codes ON codes.linkid = f.itemId AND ItemNum >0
    INNER join (SELECT DISTINCT parent_id, 
                   MAX(ins1.feesched) as CustomerNum
    FROM   admin.customers AS p 
          left join admin.feeschedule AS ins1 
                 ON ins1.feescheduleid = p.secfeescheduleid 
          left join admin.group AS c1 
                 ON c1.insid = ins1.feesched 
    WHERE status =1
                 GROUP BY parent_id) 
      AS ip ON ip.parent_id = f.parent_id 

WHERE CustomerNum >0  AND ItemNum >0 
Alan
  • 2,046
  • 2
  • 20
  • 43
  • What is your version of mySql? – jose_bacoy Mar 29 '18 at 23:09
  • If the query that produces this result is taking that long to run, it might use some optimization. It might also be modified to produce the end result you desire. Edit your question and show the schema of the 5 tables, and your query. – Sloan Thrasher Mar 29 '18 at 23:13
  • I added the [tag:greatest-n-per-group] tag. There are lots of answers to this type of question on Stack Overflow. For example https://stackoverflow.com/questions/1313120/retrieving-the-last-record-in-each-group-mysql – Bill Karwin Mar 29 '18 at 23:14
  • Technically, unless the database resides on each user's machine, it isn't running on their machine, they just have to wait for the server to process the request. So any optimization would need to be on the server side, or in the query. – Sloan Thrasher Mar 29 '18 at 23:15
  • Thanks guys, I updated my question with the query I am running. It takes about 25 seconds to run the first part, 35 to run the second after the `UNION ALL` even though there are less results in that set. Sloan, the database server does reside on each users computer, so all I can optimize is my query. – Alan Mar 29 '18 at 23:29

2 Answers2

2

I feel it quite simple when I'd read the first three paragraphs, but I get a little confused when I've read the whole question.

Whatever you have done to get the data posted above, once you've got the data like that it's easy to retrive "the most recent record grouped by ItemNum and FeeSched".

How to:

  • Firstly, sort the whole result set by Date DESC.
  • Secondly, select fields you need from the sorted result set and group by ItemNum, FeeSched without any aggregation methods.

So, the query might be something like this:

SELECT t.Price, t.ItemNum, t.FeeSched, t.Date 
FROM (SELECT * FROM table ORDER BY Date DESC) AS t 
GROUP BY t.ItemNum, t.FeeSched;

How it works:

When your data is grouped and you select rows without aggregation methods, it will only return you the first row of each group. As you have sorted all rows before grouping, so the first row would exactly be "the most recent record".

Contact me if you got any problems or errors with this approach.

walter
  • 1,199
  • 7
  • 13
  • Thank you, but I still get the `Non-group-by expression in select clause` error. I have to either do a `MIN, MAX, SUM` operation or include it in the `GROUP BY` statement. I wish there was an `IGNORE()` operation to ignore the differences in the price field and group it anyway – Alan Mar 30 '18 at 04:08
  • @Alan Are you using MySQL? What's your MySQL version? According to the Manual page(https://dev.mysql.com/doc/refman/5.7/en/group-by-handling.html), probably you are using a SQL92 or earlier version. – walter Mar 30 '18 at 04:32
  • If you're using an old SQL standard, you should either upgrade it or use an alternative way. Parveen Singla has provided you an alternative way, the only problem is that it might be pretty slow. – walter Mar 30 '18 at 04:43
  • It is not actually MySQL but is Faircom's C-Tree database. For the most part I can use most MySQL methods, but some just don't work. I reworked the problem and am going to paginate the problem so I can get through the whole query without it timing out. Thank you for your help. – Alan Mar 30 '18 at 05:47
0

You can also try like this:

Select Price, ItemNum, FeeSched, Date from table where Date IN (Select MAX(Date) from table group by ItemNum, FeeSched,Customer);

Internal sql query return maximum date group by ItemNum and FeeSched and IN statement fetch only the records with maximum date.

  • This is a great solution except that I would have to run the entire query above 2X along with all the joins. I ended up reworking it and only selecting 50 customers at a time. – Alan Mar 30 '18 at 05:49