3

I need to find a way to eliminate a dependent sub-query.

I have a table of articles that can have multiple languages. The simplified table structure is as follows:

id, title, language, translation_set_id

1 A    en 0
2 B    en 2
3 B_ru ru 2
4 C    en 4
5 C_ru ru 4
6 D    en 6
7 D_fr fr 6

The translation_set_id is 0 when an article doesn't have translations, or is set to the id of the base translation. So B is the original English article, and B_ru is the Russian translation of the article.

I need a query that would allow me to return all Russian articles, or if they don't exist the original language articles. So it would return.

1 A    en 0
3 B_ru ru 2
5 C_ru ru 4
6 D    en 6

So far I have this:

SELECT id, title, language, translation_set_id
FROM articles a
WHERE 
  a.translation_set_id = 0
  OR (a.language = 'ru')
  OR (a.id = a.translation_set_id AND
       0 = (SELECT COUNT(ac.id)
            FROM articles ac
            WHERE ac.translation_set_id = a.translation_set_id 
            AND ac.language = 'ru')
     )

But this executes the sub-query for each row, creating a dependent query. Is there a way to eliminate the dependent query?

UPDATE: It seems that Neels's solution works, thanks!

But I was wondering if there is a way to generalize the solution to multiple language fallbacks? First try to get French, if that's not present, try Russian, and if that's not present, show base translation (English, or any other, depending on the original creation language)?

UPDATE2: I've built the query I needed for the updated question using Neel's solution and DRapp's solution. It can be found here http://www.sqlfiddle.com/#!2/28ca8/18 but I'll also past the queries here, for completeness sake.

Revised Data:

CREATE TABLE articles (
  id INT,
  title VARCHAR(20),
  language VARCHAR(20),
  translation_set_id INT);

INSERT INTO articles values
  (1,'A','en',0),
  (2,'B','en',2),
  (3,'B_ru','ru',2),
  (4,'C','en',4),
  (5,'C_ru','ru',4),
  (6,'D','en',6),
  (7,'D_fr','fr',6),
  (8,'E_ru','ru', 0),
  (9,'F_fr','fr', 0),
  (10,'G_ru','ru', 10),
  (11,'G_fr','fr', 10),
  (12,'G_en','en', 10);

Original query with 2 correlated sub-queries:

SELECT id, title, language, translation_set_id
FROM articles a
WHERE
  a.translation_set_id = 0
  OR (a.language = 'fr')
  OR (a.language = 'ru' AND
       0 = (SELECT COUNT(ac.id)
            FROM articles ac
            WHERE ac.translation_set_id = a.translation_set_id
            AND ac.language = 'fr'))
  OR (a.id = a.translation_set_id AND
       0 = (SELECT COUNT(ac.id)
            FROM articles ac
            WHERE ac.translation_set_id = a.translation_set_id
            AND (ac.language = 'fr' OR ac.language = 'ru'))
     );

Revised query:

SELECT  a.*
FROM articles a
LEFT JOIN articles ac ON ac.translation_set_id = a.id
  AND ac.language = 'fr'
LEFT JOIN articles ac2 ON ac2.translation_set_id = a.id
  AND ac2.language = 'ru'
WHERE a.translation_set_id = 0
  OR a.language = 'fr'
  OR (a.language = 'ru' AND ac.id IS NULL)
  OR (a.id = a.translation_set_id AND ac2.id IS NULL AND ac.id IS NULL);
alcroito
  • 401
  • 3
  • 13

4 Answers4

2

Per slight modification adjustment from Ypercube on a more simplified where clause, and your need to NOT be able to utilize coalesce(), I have revised to this below.

Get all articles that are either Translated = 0, OR The ID IS the same as the Translated indicating it must have been the original document before it was translated to something else. That said, you are guaranteed all original documents.

Now, the left-join. IF THERE IS a corresponding "Russian" article (or other language translation of interest), grab that ID and it's translated title along with it. So the returned record has both the original AND the translated references.

SELECT
      a1.id as OriginalAricleID,
      a1.title as OriginalTitle,
      a1.language as OriginalLanguage,
      a2.id as TranslatedAricleID,
      a2.title as TranslatedTitle
   from
      Articles a1
         LEFT JOIN Articles a2
            ON a1.id = a2.translation_set_id
            AND a2.language = 'ru'
   where
         a1.translation_set_id = 0
      OR a1.id = a1.translation_set_id 

It goes through the table once and no duplicates. The left-join points to the same articles table, but ONLY for the Russian language set based on the original article.

DRapp
  • 47,638
  • 12
  • 72
  • 142
  • This is a clever solution, but unfortunately I can't use coalesce because of CMS constraints. Other than that, works great. – alcroito Feb 24 '14 at 13:53
  • You can change the 3 where conditions to `where a1.language = 'en' ;` – ypercubeᵀᴹ Feb 24 '14 at 14:07
  • @ypercube, however, what if there is an original language in 'fr' and translated to 'ch'. There is no 'en' version as a basis to include. That is why I did it this way. The first of an artical has 0 translation ID. If not, then any subsequently translated document gets the original ID, and the original is stamped with its own ID, thus the equality test. – DRapp Feb 24 '14 at 14:14
  • @Placinta, why not using coalesce(), could you get away with a case/when? or an IF()?, otherwise you would need a union to get all Russian, then all others into the same column names. – DRapp Feb 24 '14 at 14:17
  • @ypercube, true, but we can't guess all the underlying elements of their data :) But the translation to itself would indicate the original, even if it were a higher number, but it would make more sense to have an original document, THEN a newer one AFTER that. – DRapp Feb 24 '14 at 14:19
  • Yeah, you are right. It would be easier if that column did not have `0` but was equal to the `id` column. So, we could check if a document is original with `WHERE id = translation_set_id` – ypercubeᵀᴹ Feb 24 '14 at 14:20
  • @ypercube, that is what I did in the last OR... the ID = translation but only if it was not already determined to be Russian. – DRapp Feb 24 '14 at 14:21
  • @ypercube, AAAHHhhhh... I see it. Good simplification of my original query. – DRapp Feb 24 '14 at 14:39
  • @Placinta, revised answer to remove coalesce to see if that helps you for a solution. – DRapp Feb 24 '14 at 15:05
  • @DRapp Your query returns both the original titles and the translated title, if they are present, otherwise NULL. That is not exactly what I needed. But your query together with Neels's helped me forge the one exactly I need. – alcroito Feb 25 '14 at 10:02
  • @Placinta, and Neels where clause is what I originally started with too. – DRapp Feb 25 '14 at 12:01
1

You could use a LEFT JOIN:

SELECT a.id, a.title, a.language, a.translation_set_id
  FROM articles a
 LEFT JOIN articles ac ON ac.translation_set_id = a.translation_set_id 
                      AND ac.language = 'ru'
 WHERE a.translation_set_id = 0
    OR (a.language = 'ru')
    OR (    a.id = a.translation_set_id 
        AND ac.id IS NULL
       )
 GROUP BY a.id, a.title, a.language, a.translation_set_id
Oscar Pérez
  • 4,377
  • 1
  • 17
  • 36
  • Why is there a need for Group By? Also is it legal to have a group by clause, without any aggregation expressions? – alcroito Feb 24 '14 at 13:06
  • It is legal. And it is needed because we are using a `join` and it could happen that, without the `GROUP BY` there were repeated results (depending on the table's data, of course) – Oscar Pérez Feb 24 '14 at 14:01
1

Check out this SQL Fiddle:

http://www.sqlfiddle.com/#!2/c05d0/15

You can use this simple query to achieve your result.

SELECT  a.*
FROM articles a
LEFT OUTER JOIN articles ac ON ac.translation_set_id = a.translation_set_id 
AND ac.language = 'ru'
WHERE a.translation_set_id = 0
OR a.language = 'ru'
OR (a.id = a.translation_set_id AND ac.id IS NULL); 
Neels
  • 2,547
  • 6
  • 33
  • 40
  • this query could give repeated values, isn't it? I think it should have a `DISTINCT` or a `GROUP BY` – Oscar Pérez Feb 24 '14 at 11:42
  • I thought of giving a DISTINCT, but looking at the data it did not look necessary, since the conditions in the `WHERE` clause will be satisfied for only one title. – Neels Feb 24 '14 at 11:48
  • This seems to work, thanks! I came close to something like this, but I joined a.id = ac.translation_set_id, instead of both translation ids. Also, I'm not sure if it's okay to ask another question in the same place, but would it be possible to generalize the query, to go through multiple language fallback? So, try to first show Russian, if not found, try to show French, and if that's not found, the original base translation (English)? – alcroito Feb 24 '14 at 13:08
1

Rewrite this part:

AND
       0 = (SELECT COUNT(ac.id)
            FROM articles ac
            WHERE ac.translation_set_id = a.translation_set_id 
            AND ac.language = 'ru')

into an anti-join condition:

AND NOT EXISTS (
                SELECT 1
                FROM articles ac
                WHERE ac.translation_set_id = a.translation_set_id 
                AND ac.language = 'ru'
)

This may speed-up the query, because MySql must always read all rows to obtain the count(),
but when using NOT EXISTS (or EXISTS) it stops reading the table when it finds 1st row that meets criteria.

krokodilko
  • 35,300
  • 7
  • 55
  • 79
  • It might speed up the query, but I was wondering if there was a solution without running the sub-query for each row. – alcroito Feb 24 '14 at 13:06