-1

I'm wondering why this query doesn't give me the expected (by me) result. I have a table like this, with many rows. For each city there is a unique geonameid. Each city could have it's version in different languages.

alternatenameId | geonameid | isoLanguage | alternateName | isPreferredName
1554355            5128581        en          Big Apple
1614284            5128581        en        New York City   
9138713            5128581        en          New York            1

This is the query (I limited to one city to make an example, but I'll extend to more cities):

SELECT geonameid, 
       alternateName, 
       MAX(isPreferredName) 
FROM alternatename_new 
WHERE geonameid=5128581 
AND isoLanguage = 'en' 
GROUP BY geonameid

This is the query result

 geonameid | alternateName | isPreferredName
  5128581     Big Apple            1

But i expect to have this

 geonameid | alternateName | isPreferredName
  5128581      New York            1

What I'm doing wrong?

Ergest Basha
  • 7,870
  • 4
  • 8
  • 28
Luca
  • 848
  • 1
  • 14
  • 33
  • 1
    Incomplete GROUP BY will produce indefinite value in `alternateName` or an error if ONLY_FULL_GROUP_BY is enabled. – Akina Nov 25 '21 at 20:22
  • Putting `MAX()` around a column name only determines the value for that column in the output. It doesn't magically pick the corresponding row for other columns in your output. What would you expect `SELECT id, Col1, MIN(Col2), MAX(Col2) FROM fubar GROUP BY id` to do? What you need is something that picks the row you're interested in; normally a `WHERE` clause, sometimes using joins, sub-queries, etc. – MatBailie Nov 25 '21 at 20:33
  • For each `geonameid`, is there only ever one row where `isPreferredValue` is `1`? – MatBailie Nov 25 '21 at 20:36
  • @MatBailie: yes – Luca Nov 25 '21 at 20:40
  • @luca then just add `isPreferredName = 1` to the `WHERE` clause, instead of wishing that using `GROUP BY` and `MAX()` had some magical side effect. – MatBailie Nov 25 '21 at 20:42
  • @MatBailie Sorry. My mistake. For each city isPreferredName may occur only one time, but not all cities have isPreferredName filled... – Luca Nov 25 '21 at 20:52
  • @luca - which version of MySQL (MySQL 5.x doesn't have the same functionality as MySQL8)? What do you want to pick when no rows have `isPreferedName = 1`? The row with the lowest id? The name that's last alphabetically? All the names? None of the names? An error message? A banoffee cheese cake? – MatBailie Nov 25 '21 at 20:59
  • @Luca: if there is no row for a city which has isPreferredValue = 1, is it guaranteed that then only one row exists (with isPreferredValue being NULL)? – lukas.j Nov 25 '21 at 21:02

3 Answers3

1

If you were to do the following query, which alternateName would you expect to see in the result?

SELECT geonameid, alternateName, 
  MAX(isPreferredName),
  MIN(isPreferredName) 
FROM alternatename_new 
WHERE geonameid=5128581 and isoLanguage = 'en' 
GROUP BY geonameid

Should it be "Big Apple" or "New York"? That is, the alternateName that is found in the row that has the max value or min value?

What if multiple rows in the group all tied for the isPreferredName value, but had different alternateName values on each row? Which one should be returned?

What if you were using another aggregate function that doesn't give a value that occurs on any of the rows? E.g. COUNT() or AVG()?

The answer is that SQL aggregate functions produce a result, but the other columns in your query aren't related to the result of an aggregate function.

In MySQL, the other column alternateName is basically arbitrary. Its value comes from one of the rows in the group, but not the row where the max was found.

In practice, MySQL happens to return the value from the first row in the group, with respect to the index order it read the rows. But this is not standard, and not guaranteed. It's just a coincidence of the way the MySQL code was written.

In SQLite, for example, the value comes from the last row in the group.

In other brands of SQL database, as well as in the SQL standard, this sort of ambiguous query is not legal. MySQL permits it unless you set a more strict SQL mode. The strict SQL mode is enabled by default in recent versions of MySQL, and this is a good thing.

You should also read my answers to:

Bill Karwin
  • 538,548
  • 86
  • 673
  • 828
0

It seems that isPreferredName could be a boolean with 1 for true and 0 for false. If so, just add isPreferredName = 1 to the WHERE clause:

SELECT
  geonameid,
  alternateName,
  isPreferredName
FROM
  alternatename_new
WHERE
  isoLanguage = 'en' AND isPreferredName = 1
lukas.j
  • 6,453
  • 2
  • 5
  • 24
  • I cannot use isPreferredName = 1 since when I extend the query to all cities, not all of them have isPreferredName filled – Luca Nov 25 '21 at 20:50
  • In that case I think all you have to do is to add _ alternateName_ to your GROUP BY clause. – lukas.j Nov 25 '21 at 20:57
0

Your query don't use properly group by
in most recent version of mysql (>5.6) produce error(by default ) on other version of for only_group_by_mode seto to false produce unpredictable result fro your expetced result you should use

            select  a.geonameid, b.alternateName, a.max_id
    from (
        SELECT geonameid,  MAX(isPreferredName) max_id
        FROM alternatename_new 
        WHERE geonameid=5128581 and isoLanguage = 'en' 
        GROUP BY geonameid 
    ) a 
    inner join alternatename_new b on a.geonameid = b.geonameid and a.max_id = b.isPreferredName
ScaisEdge
  • 131,976
  • 10
  • 91
  • 107