1

How can I put this into one query, the key point is that when grouping artist by name I need to consider all rows in artist, but I don't want to keep rows in artist_nodup table if it already is in mbartist_discogsartist table. (I'm using Postgres 9.3)

INSERT INTO artist_nodup
SELECT
min(a1.id) as id,
a1.name
FROM artist  a1
GROUP BY a1.name
HAVING COUNT(*)=1
;

DELETE
FROM artist_nodup T1
WHERE exists
( select 1
   from mbartist_discogsartist T2
  where t1.id = t2.discogs_id
)
;
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
Paul Taylor
  • 13,411
  • 42
  • 184
  • 351

2 Answers2

1

If it's enough to avoid the redundant rows in the INSERT and there is no need to delete previously existing rows from artist_nodup as well:

INSERT INTO artist_nodup
SELECT a.id, a.name
FROM   artist a
LEFT   JOIN artist a2 ON a2.name = a.name AND a2.id <> a.id
LEFT   JOIN mbartist_discogsartist m ON m.discogs_id = a.id
WHERE  a2.name IS NULL
AND    m.discogs_id IS NULL;

The 1st LEFT JOIN rules out rows from artist where other rows exist with the same name.
The 2nd LEFT JOIN rules out rows from artist where a matching id exists in mbartist_discogsartist (replacing the later DELETE).

LEFT JOIN / IS NULL is one of a couple of techniques to do this:

If you also want to rule out pre-existing rows in artist_nodup:

...
LEFT JOIN artist_nodup an ON an.name = a.name
...
AND an.name IS NULL

You get the drill ...

Community
  • 1
  • 1
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • Im well viewed in the 2nd LEFT JOIN usage, but the first is interesting I never thought of that instead of GROUP BY HAVING COUNT(*)=1, but what do you mean by 'If it's enough to avoid the redundant rows in the INSERT' ? – Paul Taylor Mar 12 '15 at 16:37
  • Your original `DELETE` would also delete pre-exsisting, offending rows in `artist_nodup`, which my answer does not. – Erwin Brandstetter Mar 12 '15 at 16:46
  • thats 'and there is no need to delete previously existing rows from artist_nodup as well:' but what does 'If it's enough to avoid the redundant rows in the INSERT' mean ? – Paul Taylor Mar 13 '15 at 09:26
  • @PaulTaylor: Your original example inserts some redundant rows, which are then deleted again. My query does not insert the rows to begin with - *avoids the redundant rows* - which is considerably cheaper and should normally be the superior solution. But it may not be 100 % the same thing, for instance if there are triggers on `artist_nodup`. – Erwin Brandstetter Mar 13 '15 at 15:10
  • ah understand, yeah the reason why I did the delete is because I couldnt work out to do the group by over the whole but only insert where not already in other table, thanks again the solution is just what i wanted. – Paul Taylor Mar 13 '15 at 15:36
0

You can create Stored Procedure.

CREATE PROCEDURE sp_InsertDelete
AS
BEGIN
    INSERT INTO artist_nodup
    SELECT
    min(a1.id) as id,
    a1.name
    FROM artist  a1
    GROUP BY a1.name
    HAVING COUNT(*)=1
    ;
    GO
    DELETE
    FROM artist_nodup T1
    WHERE exists
    ( select 1
       from mbartist_discogsartist T2
      where t1.id = t2.discogs_id
    )
    ;
END

And to use both queries you can call new created stored procedure: EXECUTE sp_InsertDelete


Or you can create TRIGGER on artist_nodup table. It automatically will use delete query when something will be inserted to artist_nodup table.

Note: If you need to use delete query only after provided insert query, this wont work for you, that because this delete query always will be used after anything will be inserted.

CREATE TRIGGER AutoDelete ON artist_nodup 
AFTER INSERT 
AS
BEGIN
     DELETE
     FROM artist_nodup T1
     WHERE exists
     ( select 1
        from mbartist_discogsartist T2
       where t1.id = t2.discogs_id
     )
     ;   
END;
  • Thanks but I was looking for a pure sql solution, is that not possible – Paul Taylor Mar 12 '15 at 15:19
  • Triggers and stored procedures are all "pure SQL." Not every operation is designed to be done in a single statement. Likely what you want to do is put your commands into a text file, and then feed that psql such as: `psql -q dbname < mycommands.txt` In this way, you can execute both operations with a single command. – wadesworld Mar 13 '15 at 05:54
  • No I was looking for efficiency and simplicity and your stored procedure still runs two queries instead of 1 and is more complex so its not any improvement for me, but thanks anyway. – Paul Taylor Mar 13 '15 at 09:28