0

The following query pulls data correctly as expected, however the left join with lnk_cat_isrc table and through that to catalogue table, brings back repeated data if there is more than one item in catalogue which has the same isrcs from isrc table:

SELECT 
                isrc.ISRC,
                isrc.Track_Name,
                isrc.ArtistName,
                isrc.TitleVersion,
                isrc.Track_Time,
                `isrc_performer`.`PerformerName` ,
                `performer_category`.`PerformerCategory` ,
                `isrc_performer`.`PerformerRole` ,
                `isrc`.`isrc_ID`,
                `isrc_performer`.`Perf_ID`

        FROM `isrc`

        LEFT JOIN `isrc_performer` ON (isrc.isrc_ID = isrc_performer.isrc_ID)
        LEFT JOIN `performer_category` ON (performer_category.PerfCat_ID = isrc_performer.PerfCat_ID)
        LEFT JOIN `lnk_cat_isrc` ON (lnk_cat_isrc.isrc_ID = isrc.isrc_ID)
        LEFT JOIN `catalogue` ON (catalogue.ID = lnk_cat_isrc.cat_id) 
        ORDER BY   isrc_ID     desc LIMIT 0 , 10
        ";

I cannot use group by on isrc, because the isrc_performer table can have more than one performer to an isrc.

So the relations are like this: Few items from catalogue table can have several identical items from isrc table. In turn, each isrc can have more than one entry in isrc_performer table.

What I want is to display all corresponding data from isrc_performer in relation to each isrc, but not repeating it for each item from catalogue table. I also want to display all the rest "empty" isrcs (those which don't have any data in isrc_performer table)

Can you give me any ideas?

P.S. despite I'm not pulling any data from catalogue table itself, I'm using it to search by a catalogue number, when user defines search criteria for $where_condition variable, hence I need to keep it in the query. i.e. $where_condition = "catalogue.Catalogue LIKE '%test%' OR ISRC LIKE '%test%' OR Track_Name LIKE '%test%' OR ArtistName LIKE '%test%' OR TitleVersion LIKE '%test%' OR PerformerName LIKE '%test%' OR PerformerCategory LIKE '%test%' OR PerformerRole LIKE '%test%'";

------UPD:

trying to graphically represent possible variation in these 3 tables relations:

cat1 - isrc1 - performer1
       isrc2 - performer1
             - performer2
             - performer3

cat2 - isrc2 - performer1
             - performer2
             - performer3
     - isrc3 - performer2
             - performer4

cat3 - isrc4
     - isrc1 - performer1

UPD (pics added)

Here are screen prints. As you can see on picture 1 there are 9 rows with same isrc number, however there are 3 repeated performers Jason, David, Paul.

enter image description here

This is because 3 different catalogue items have this exact isrc with 3 different performers as per pic 2

enter image description here

= 1(isrc) * 3(catalogue) * 3(performers) = 9 row on output

All I want is that Performers grid would only display 3 rows of this isrc for each performer.

Elen
  • 2,345
  • 3
  • 24
  • 47
  • 2
    See [Why should I provide an MCVE for what seems to me to be a very simple SQL query](http://meta.stackoverflow.com/questions/333952/why-should-i-provide-an-mcve-for-what-seems-to-me-to-be-a-very-simple-sql-query) – Strawberry Aug 17 '17 at 18:22
  • @Strawberry i have updated my question – Elen Aug 17 '17 at 18:28
  • So i guess `distinct` doesn't solve the duplicate problem. So then it's not really a duplicate problem but a repeating value prolbme? you dont' want to see column values repeat? But then you have no order by which would make the repeating values make some sense. an example expected results would help showing what you get now and what you want. – xQbert Aug 17 '17 at 18:38
  • @xQbert, I've added `order by` to my question, however it makes no impact. and yes you are right - it's not duplicated, it's displaying correctly according to my query, i just want to modify it more to remove repeated rows – Elen Aug 17 '17 at 18:42
  • Data presentation is best handled in the presentation layer not the database. However, if you Must NULL out the values; then you could keep a user variable for each column and check to see if the old value is the same as new; if it is the same Display null and keep the variable, if it's not the same set the variable to the field value. Normally you don't do this at the db layer because if you want to give the user a "Export to excel" function, you have to have separate queries. otherwise the null values make it difficult for them to splice the data w/o filling in what has been blanked out. – xQbert Aug 17 '17 at 18:45
  • @xQbert yes, the more i think about it the easier i think it would be to filter the returned array with some php, but lets see if anyone has any suggestions.. – Elen Aug 17 '17 at 18:49
  • http://rextester.com/FVMR97502 gives you an example of how to accomplish what I believe you're after: NULLing out repeated column values for specific columns when an order by is defined I just don't recommend it. – xQbert Aug 17 '17 at 18:54
  • @xQbert that is very interesting for not displaying isrc detail each time for a different performer, however it's the join with `catalogue` table that makes `isrc` repeat each time there is a relation between `catalogue` items in `isrc` items – Elen Aug 17 '17 at 18:59
  • 1
    Then why do you need the join if no data is being grouped, limited or displayed from that table? This goes back to the 1st comment. Post sample data of what you're getting and what you want to want it to be. Picture worth a 1000 words. – xQbert Aug 17 '17 at 19:00
  • @xQbert because i still use that table to search by `catalogue` items, but when there is no search specified, i get back repeated lines for each `catalogue` item. – Elen Aug 17 '17 at 19:02
  • Now I get it (or at least I think I do) :P I'll update my answer – xQbert Aug 17 '17 at 19:04
  • Possible duplicate of [Multiple Self-Join based on GROUP BY results](https://stackoverflow.com/questions/47758492/multiple-self-join-based-on-group-by-results) – philipxy Jun 21 '18 at 17:08
  • There is still no [mcve]--code we can cut & paste & run. Also your "what I want" is unclear. Also putting words in scare quotes does not make them clear. Use enough words, phrases & sentences to say what you mean. PS Please [use text, not images/links, for text (including code, tables & ERDs)](https://meta.stackoverflow.com/q/285551/3404097). Use an image only for convenience to supplement text and/or for what cannot be given in text. Use edit functions to inline, not link, if you have the rep--make your post self-contained. And never give a diagram without a legend/key. – philipxy Jun 21 '18 at 17:10

1 Answers1

1

---Rearrange the answer to put the "best" option up top.. .but is all of this for naught.. w/o any data from lnk_cat_isrc or catalogue being returned, why does filtering on catalog make a difference? we're returning all isrc regardless of any filtering because it's a left join...

So this brings into question given sample data what are the expected results.

Possibly more elegant... (but not sure if it would be faster) moving away from exists and simply using a distinct in a subquery so catalog queries always return 1 row per isrc; solving the 1-M problem keeping the left join thereby keeping the isrc records not in the catalog limits. Return all isrc information performer information if it exists, performer category info if it exists and catalogue information If, and only if it matches the catalog filters.

SELECT isrc.ISRC
    , isrc.Track_Name
    , isrc.ArtistName
    , isrc.TitleVersion
    , isrc.Track_Time
    ,`isrc_performer`.`PerformerName` 
    ,`performer_category`.`PerformerCategory` 
    ,`isrc_performer`.`PerformerRole` 
    ,`isrc`.`isrc_ID`
    ,`isrc_performer`.`Perf_ID`
FROM `isrc`
LEFT JOIN `isrc_performer` 
  ON isrc.isrc_ID = isrc_performer.isrc_ID
LEFT JOIN `performer_category` 
  ON performer_category.PerfCat_ID = isrc_performer.PerfCat_ID
LEFT JOIN (SELECT distinct lnk_cat_isrc.isrc_ID
           FROM `lnk_cat_isrc` 
           INNER JOIN `catalogue` 
             ON catalogue.ID = lnk_cat_isrc.cat_id
           WHERE...) DCat
   ON Dcat.isrc_ID = isrc.isrc_ID
ORDER BY   isrc_ID     desc 
LIMIT 0 , 10;

As you pointed out the join is causing the problem. So eliminate the join and use the exists notation. Distinct would also work since you're not selecting any values from catalog; though exists should be faster.

Fast but doesn't include all isrc records... (not sure why the or not exists should bring them back in...)

SELECT isrc.ISRC
     , isrc.Track_Name
     ,isrc.ArtistName
     ,isrc.TitleVersion
     ,isrc.Track_Time
     ,`isrc_performer`.`PerformerName` 
     ,`performer_category`.`PerformerCategory` 
     ,`isrc_performer`.`PerformerRole` 
     ,`isrc`.`isrc_ID`
     ,`isrc_performer`.`Perf_ID`
    FROM `isrc`
    LEFT JOIN `isrc_performer` 
      ON (isrc.isrc_ID = isrc_performer.isrc_ID)
    LEFT JOIN `performer_category` 
      ON (performer_category.PerfCat_ID = isrc_performer.PerfCat_ID)
    WHERE EXISTS (SELECT * 
                  FROM  `lnk_cat_isrc` 
                  INNER JOIN `catalogue` 
                    ON catalogue.ID = lnk_cat_isrc.cat_id
                   --and your other criteria
                  WHERE (lnk_cat_isrc.isrc_ID = isrc.isrc_ID)
                  ) 
     OR NOT EXISTS (SELECT * 
                    FROM `lnk_cat_isrc` 
                    WHERE lnk_cat_isrc.isrc_ID = isrc.isrc_ID
    ORDER BY isrc_ID desc 
    LIMIT 0 , 10

Or using select distinct simple straight forward; but slow

 SELECT isrc.ISRC
     , isrc.Track_Name
     ,isrc.ArtistName
     ,isrc.TitleVersion
     ,isrc.Track_Time
     ,`isrc_performer`.`PerformerName` 
     ,`performer_category`.`PerformerCategory` 
     ,`isrc_performer`.`PerformerRole` 
     ,`isrc`.`isrc_ID`
     ,`isrc_performer`.`Perf_ID`
  FROM `isrc`
  LEFT JOIN `isrc_performer` 
    ON (isrc.isrc_ID = isrc_performer.isrc_ID)
  LEFT JOIN `performer_category` 
    ON (performer_category.PerfCat_ID = isrc_performer.PerfCat_ID)
  LEFT JOIN `lnk_cat_isrc` 
    ON (lnk_cat_isrc.isrc_ID = isrc.isrc_ID)
  LEFT JOIN `catalogue` 
    ON (catalogue.ID = lnk_cat_isrc.cat_id) 
   --AND (other criteria on catalog here, cause in a where clause you left joins will behave like inner joins)
  ORDER BY isrc_ID desc 
  LIMIT 0 , 10;
xQbert
  • 34,733
  • 2
  • 41
  • 62
  • right, `exist` deffo runs faster, however it still acts as my original query - repeating all lines for each `catalogue` items... `Distinct` does exactly what i'm expecting, but it is slower... and yes, i don't get the `isrc`s with no performers added when i specify a cat.num... – Elen Aug 17 '17 at 20:26
  • That doesn't make much sense to me the exists can't have the repeated records from catalog as catalog isn't in a join anymore. ahhh.. lnk_cat_isrc has cat_ID we would need move it's join into the exists as well... and since you're not displaying data form lnk_cat_isrc it should work... and that explains why the distinct worked. and this didn't ;P Not sure if the exists should be a left join or if it could be an inner... I'll leave that for you to ponder; it makes a difference on where you have to put the catalog critieria/filters. – xQbert Aug 17 '17 at 20:40
  • to clarify - `where exists` will only display isrc's that are linked to some catalogue? meaning that if there are isrc's which are not connected to any cat.num they won't be displayed in the output? – Elen Aug 17 '17 at 21:19
  • Dang... you're making me think! Yes it would omit them and based on all the left joins you want them. we can handle it with a not exists and an or... but I'm trying to see if there's a more elegant approach.(updated using not exists) still looking – xQbert Aug 17 '17 at 21:23
  • Im sorry for making it more complicated! :D but unfortunately there are cases when an isrc exist, but not connected to any catalogue... it looks like i should think of php way to filter original query... unless ur genius brain can come up with something :D i mean pivot table in excel would just display all the rest isrcs under blank catalogue - how do they do that? :)) – Elen Aug 17 '17 at 21:30
  • Did the messy approach work (using an or not exists)? (i'm struggling w/ elegant at the moment) and updated w/ a possibly elgant approach. – xQbert Aug 17 '17 at 21:38
  • it displays the same amount of entries as before, basically still omitting some records.. – Elen Aug 17 '17 at 21:40
  • Dang well the more elegant approach should solve the problem. and if all else fails the distinct works albeit slow just need to work on fine tuning. Would really help me if we had a sample data set on something like rextester.com or sqlfiddle.com [Mind blown taking a few minutes for myself] – xQbert Aug 17 '17 at 21:41
  • oh... got thousands of records there... let me select few for you and try to put it on rextester.com (i'm new to it) – Elen Aug 17 '17 at 21:45
  • I think the elegant approach keeping with left joins will return correct records I'm just not so sure about the efficiency) – xQbert Aug 17 '17 at 21:57
  • the elegant solution gives me error: `Error Code: 1054. Unknown column 'isrc.isrc_ID' in 'where clause' 0.000 sec ` still trying to get http://rextester.com to work, but i have a lot of private data there which i cannot disclose... it will take a while to redo some... – Elen Aug 17 '17 at 22:13
  • I only have 1 where clause and it's ... so did you put anything after the where that references isrc.isrc_id? if so it should probably reference lnk_cat_isrc.isrc_ID instead. or maybe as w where clause before the last order by? – xQbert Aug 17 '17 at 22:16
  • yes i just tried with `WHERE ( catalogue.Catalogue LIKE '%171477%' OR isrc.ISRC LIKE '%171477%')` and it doesn't recognise `isrc.ISRC` now... – Elen Aug 17 '17 at 22:20
  • `isrc.isrc` is not in scope in the subquery. however, `lnk_cat_isrc.isrc_ID` is in scope and it should be the same value; when they match of course. so change `isrc.isrc` to `lnk_cat_isrc.isrc_ID` – xQbert Aug 17 '17 at 22:21
  • yes i can see what you mean... hence this also means that i cannot search by `isrc.ArtistName` or `isrc_performer.PerformerRole`... i have to think about it more =D you are great anyway! – Elen Aug 17 '17 at 22:34
  • you can, the searches have to be on the joins since you're using left joins. – xQbert Aug 17 '17 at 23:00
  • After various tests with `EXISTS` and your elegant solution and combination of `where` statements, I came to conclusion that the only way to go is to use `Distinct`. It is slower EXISTS or the first query, but at least it gives me opportunity to run where statement across all fields. Thank you very much – Elen Aug 22 '17 at 09:23