1

I have a table X I want to update according to the entries in another table Y. The join between them is not unique. However, I want each entry in Y to update a different entry in X.

So if I have table X:

i (unique)  k           v         
----------  ----------  ----------
p           100         b         
q           101         a         
r           202         x         
s           301         a         

and table Y:

k (unique)  v         
----------  ----------
0           a         
1           b         
2           a         
3           c         
4           a         

I want to end up with table X like:

i           k           v
----------  ----------  ----------
p           1           b
q           0           a
r           202         x
s           2           a

The important result here is that the two rows in X with v = 'a' have been updated to two distinct values of k from Y. (It doesn't matter which ones.)

Currently, this result is achieved by an extra column and a program roughly like:

UPDATE X SET X.used = FALSE;
for Yk, Yv in Y:
    UPDATE X
       SET X.k = Yk,
           X.used = TRUE
     WHERE X.i IN (SELECT X.i FROM X
                    WHERE X.v = Yv AND NOT X.used
                    LIMIT 1);

In other words, the distinctness is achieved by "using up" the rows in Y. This doesn't scale well.

(I'm using SQLite3 and Python, but don't let that limit you.)

rptb1
  • 460
  • 4
  • 11
  • Are you using version 3.25 or later? – Gordon Linoff Dec 04 '18 at 22:37
  • 1
    The trick here is you want to assign a row number to each v entry in tables y an x based on the order of k and i (or some date order or whatever) This would then allow you to join both on v and the row number ensuring each record is allocated. This can be accomplished using a window function; but not available prior to 3.25. So then the question becomes how do we simulate it in 3.0.. perhaps by... https://stackoverflow.com/questions/16847574/how-to-use-row-number-in-sqlite – xQbert Dec 04 '18 at 22:45
  • SQLite version 3.22.0 (from the Ubuntu repo). No window functions. I'd like to avoid a custom local version but it's an option. – rptb1 Dec 04 '18 at 23:05
  • i is unique in this case, if that helps. No other combination of i, k, and v are unique. – rptb1 Dec 04 '18 at 23:06
  • A custom local version is the best way to use sqlite. That way you *know* you have the features you need available and aren't stuck on a random old version without optional modules you might want or newer features and fixes. – Shawn Dec 04 '18 at 23:25
  • That rather depends on the way the software will be delivered and deployed, but I take your point. – rptb1 Dec 04 '18 at 23:43
  • It also eliminates a dependency if you distribute your application. – Shawn Dec 04 '18 at 23:49
  • You'll need windowing functions for this, if it's possible. – Joel Coehoorn Dec 05 '18 at 04:31

1 Answers1

1

This can be solved by using rowids to pair up the results of a join. Window functions aren't necessary. (Thanks to xQbert for pointing me in this direction.)

First, we sort the two tables by v to make tables with rowids in a suitable order for the join.

CREATE TEMPORARY TABLE Xv AS SELECT * FROM X ORDER BY v;
CREATE TEMPORARY TABLE Yv AS SELECT * FROM Y ORDER BY v;

Then we can pick out the minimum rowid for each value of v in order to create a "zip join" for that value, pairing up the rows.

SELECT i, Yv.k, Xv.v
  FROM Xv JOIN Yv USING (v)
       JOIN (SELECT v, min(Xv.rowid) AS r FROM Xv GROUP BY v) AS xmin USING (v)
       JOIN (SELECT v, min(Yv.rowid) AS r FROM Yv GROUP BY v) AS ymin
         ON ymin.v = Xv.v AND Xv.rowid - xmin.r = Yv.rowid - ymin.r;

The clause Xv.rowid - min.x = Yv.rowid - min.y is the trick: it does a pairwise match of rows with the same value of v, essentially allocating one to the other. The result:

i           k           v         
----------  ----------  ----------
q           0           a         
s           2           a         
p           1           b         

It's then a simple matter to use the result of this query in an UPDATE.

WITH changes AS (<the SELECT above>)
   UPDATE X SET k = (SELECT k FROM changes WHERE i = X.i)
    WHERE i IN (SELECT i FROM changes);

The temporary tables could be restricted to the common values of v and possibly indexed on v if the query is large.

I'd welcome refinements (or bugs!)

rptb1
  • 460
  • 4
  • 11