0

My data looks like this:

TableA
- id INT
- is_in_table_b BOOL

TableB
- id INT
- table_a_id INT

I accidentally wiped out the 'is_in_table_b' BOOL on my dev machine while reorganizing the data structures, and I forgot how I created it. It's just a shortcut for some dev benchmarks.

All the "UPATE ... FROM ...." variations I tried are setting everything as "true" based on a the join. I can't remember if I originally had a CAST in this.

Does anyone know of a simple , elegant way to accomplish this? I just want to set is_in_table_b to True if the TableA.id appears in TableB.table_a_id. I know some non-elegant ways with inner queries, but I want to remember the more-correct ways to do this. I'm positive I had this done in an "Update From" originally.

Jonathan Vanasco
  • 15,111
  • 10
  • 48
  • 72

3 Answers3

2

This one should be simple enough:

UPDATE tableA SET
 is_in_table_b = exists (select 1 FROM tableB WHERE table_a_id=tableA.id);
Daniel Vérité
  • 58,074
  • 15
  • 129
  • 156
  • Whoops :) Removed that bad link. – erik258 Mar 05 '14 at 02:20
  • I think the exact syntax I wanted was the Update From `UPDATE table_a SET is_in_b = TRUE FROM table_b WHERE table_b.table_a_id = table_a.id ;`, inspired by Dan Farrel in the other answer. This works and is succinct, but in my table structure is not nearly as efficient ( 1700ms vs 41100ms ). – Jonathan Vanasco Mar 05 '14 at 16:46
  • @Jonathan: that's a different query. The one I suggest sets `is_in_b` to false when there's no matching entry in tableB, whereas the one you suggest leaves it in its previous, *wiped out* state (presumably NULL?) for these cases. – Daniel Vérité Mar 05 '14 at 17:18
  • yes you're right. i'd have to do 2 queries, but setting all to NULL first is no real performance hit. I think the more correct version would be this: `UPDATE table_a SET is_in_b = CASE WHEN b.table_a_id IS NOT NULL THEN True WHEN b.table_a_id IS NULL THEN False END FROM table_a a LEFT JOIN table_b b ON b.table_a_id = a.id WHERE table_a.id = a.id ;` – Jonathan Vanasco Mar 05 '14 at 18:19
1

yeah, do a JOIN between the tables for an UPDATE.

the setup:

CREATE TABLE table_a ( 
  id int not null auto_increment primary key, 
  is_in_b boolean
);
CREATE TABLE table_b (
  table_a_id int
);
-- create some test data in table_a;
INSERT INTO table_a (is_in_b) VALUES (FALSE), (FALSE), (FALSE);
INSERT INTO table_a (is_in_b) SELECT FALSE 
    FROM table_a a1 
    JOIN table_a a2 
    JOIN table_a a3;
-- and create a subset of matching data in table_a;
INSERT INTO table_b (table_a_id) 
    SELECT id FROM table_a ORDER BY RAND() limit 5;

now the answer:

UPDATE table_a 
  JOIN table_b ON table_a_id = table_a.id
SET is_in_b = TRUE;

See the results with

SELECT * from table_b; 

SELECT * FROM table_a WHERE is_in_b;

Works on http://sqlfiddle.com/#!2/8afc0/1 - should work in Postgres too I think.

erik258
  • 14,701
  • 2
  • 25
  • 31
  • 1
    You know that SQLFiddle supports various versions of PostgreSQL, right? Just pick from the list at the top. – mu is too short Mar 05 '14 at 01:56
  • Thanks. Wrong sql, right direction. The "Update Join" is a mysql particular. The Postgres/other format is "Update From" , which would be... `UPDATE table_a SET is_in_b = TRUE FROM table_b WHERE table_b.table_a_id = table_a.id ;` – Jonathan Vanasco Mar 05 '14 at 16:23
1

Consider to drop that redundant column altogether and use a view or a "generated column" instead (with the EXISTS expression provided by @Daniel). Details under this related question:
Store common query as column?

Just be sure to have an index on TableB.table_a_id.

Community
  • 1
  • 1
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • Thanks. That's a great technique and applicable elsewhere, but not this particular situation. I'm using a pre-computed notational field for analytics in development. Table_a is relatively small but table_b has tens-of-millions of records -- even with an index, the hit to compute it as an expression within our current SELECTS is way too much. – Jonathan Vanasco Mar 05 '14 at 16:38