0


this is my first post here since most of the time I already found a suitable solution :)
However this time nothing seems to help properly.

Im trying to migrate information from some mysql Database I have just read-only access to.
My problem is similar to this one:
Group by doesn't give me the newest group
I also need to get the latest information out of some tables but my tables have >300k entries therefore checking whether the "time-attribute-value" is the same as in the subquery (like suggested in the first answer) would be too slow (once I did "... WHERE EXISTS ..." and the server hung up).
In addition to that I can hardly find the important information (e.g. time) in a single attribute and there never is a single primary key.
Until now I did it like it was suggested in the second answer by joining with subquery that contains latest "time-attribute-entry" and some primary keys but that gets me in a huge mess after using multiple joins and unions with the results.

Therefore I would prefer using the having statement like here:
Select entry with maximum value of column after grouping
But when I tried it out and looked for a good candidate as the "time-attribute" I noticed that this queries give me two different results (more = 39721, less = 37870)

SELECT COUNT(MATNR) AS MORE
    FROM(
    SELECT DISTINCT
        LAB_MTKNR AS MATNR,
        LAB_STG AS FACH,
        LAB_STGNR AS STUDIENGANG
    FROM
        FKT_LAB
    ) AS TEMP1


 SELECT COUNT(MATNR) AS LESS
    FROM(
    SELECT
        LAB_MTKNR AS MATNR,
        LAB_STG AS FACH,
        LAB_STGNR AS STUDIENGANG,
        LAB_PDATUM
    FROM
        FKT_LAB
    GROUP BY
        LAB_MTKNR,
        LAB_STG,
        LAB_STGNR
    HAVING LAB_PDATUM = MAX(LAB_PDATUM)
    )AS TEMP2

Although both are applied to the same table and use "GROUP BY" / "SELECT DISTINCT" on the same entries.
Any ideas?
If nothing helps and I have to go back to my mess I will use string variables as placeholders to tidy it up but then I lose the overview of how many subqueries, joins and unions I have in one query... how many temproal tables will the server be able to cope with?

Community
  • 1
  • 1
swenzel
  • 6,745
  • 3
  • 23
  • 37
  • feels like `HAVING LAB_PDATUM = MAX(LAB_PDATUM)` going to ignore some rows just depending in what order its reads them., see comment under the answer your linked: http://stackoverflow.com/questions/12873249/sql-select-entry-with-maximum-value-of-column-after-grouping#comment26686339_12873273 – Puggan Se Aug 13 '13 at 11:33

1 Answers1

2

Your second query is not doing what you expect it to be doing. This is the query:

SELECT COUNT(MATNR) AS LESS
FROM (SELECT LAB_MTKNR AS MATNR, LAB_STG AS FACH, LAB_STGNR AS STUDIENGANG, LAB_PDATUM
      FROM FKT_LAB
      GROUP BY LAB_MTKNR, LAB_STG, LAB_STGNR
      HAVING LAB_PDATUM = MAX(LAB_PDATUM)
    ) TEMP2;

The problem is the having clause. You are mixing an unaggregated column (LAB_PDATUM) with an aggregated value (MAX(LAB_PDATAUM)). What MySQL does is choose an arbitrary value for the column and compare it to the max.

Often, the arbitrary value will not be the maximum value, so the rows get filtered. The reference you give (although an accepted answer) is incorrect. I have put a comment there.

If you want the most recent value, here is a relatively easy way:

SELECT COUNT(MATNR) AS LESS
FROM (SELECT LAB_MTKNR AS MATNR, LAB_STG AS FACH, LAB_STGNR AS STUDIENGANG,
             max(LAB_PDATUM) as maxLAB_PDATUM
      FROM FKT_LAB
      GROUP BY LAB_MTKNR, LAB_STG, LAB_STGNR
    ) TEMP2;

It does not, however, affect the outer count.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • Thanks for your answer Gordon. That makes sense and is also what I would have thought if I had not read the other thread. In fact I overread something you didn't notice either. He groups by all requested columns and even says that this is necessary... but I don't know how the aggregate function behaves then. – swenzel Aug 13 '13 at 13:41
  • @swenzel . . . That caveat makes the answer even more non-sensical. The query is grouping by `lastActive` and then comparing it to the `max()`. Basically, this just ensures that the value is not null, because each value will be on a different row. – Gordon Linoff Aug 13 '13 at 14:19
  • My problem seems to be quite common. Nonetheless there is no solution that does not include subqueries. I found the corresponding mysql Documentation page http://dev.mysql.com/doc/refman/5.0/en/example-maximum-column-group-row.html and it turns out that I was already using the optimal way... Just another day wasted -.- – swenzel Aug 13 '13 at 14:53