A join that "allocates" from available rows

Question

I have a table X I want to update according to the entries in another table Y. The join between them is not unique. However, I want each entry in Y to update a different entry in X.

So if I have table X:

i (unique)  k           v         
----------  ----------  ----------
p           100         b         
q           101         a         
r           202         x         
s           301         a

and table Y:

k (unique)  v         
----------  ----------
0           a         
1           b         
2           a         
3           c         
4           a

I want to end up with table X like:

i           k           v
----------  ----------  ----------
p           1           b
q           0           a
r           202         x
s           2           a

The important result here is that the two rows in X with v = 'a' have been updated to two distinct values of k from Y. (It doesn't matter which ones.)

Currently, this result is achieved by an extra column and a program roughly like:

UPDATE X SET X.used = FALSE;
for Yk, Yv in Y:
    UPDATE X
       SET X.k = Yk,
           X.used = TRUE
     WHERE X.i IN (SELECT X.i FROM X
                    WHERE X.v = Yv AND NOT X.used
                    LIMIT 1);

In other words, the distinctness is achieved by "using up" the rows in Y. This doesn't scale well.

(I'm using SQLite3 and Python, but don't let that limit you.)

The trick here is you want to assign a row number to each v entry in tables y an x based on the order of k and i (or some date order or whatever) This would then allow you to join both on v and the row number ensuring each record is allocated. This can be accomplished using a window function; but not available prior to 3.25. So then the question becomes how do we simulate it in 3.0.. perhaps by... https://stackoverflow.com/questions/16847574/how-to-use-row-number-in-sqlite — xQbert, Dec 04 '18 at 22:45
SQLite version 3.22.0 (from the Ubuntu repo). No window functions. I'd like to avoid a custom local version but it's an option. — rptb1, Dec 04 '18 at 23:05
i is unique in this case, if that helps. No other combination of i, k, and v are unique. — rptb1, Dec 04 '18 at 23:06
A custom local version is the best way to use sqlite. That way you *know* you have the features you need available and aren't stuck on a random old version without optional modules you might want or newer features and fixes. — Shawn, Dec 04 '18 at 23:25
That rather depends on the way the software will be delivered and deployed, but I take your point. — rptb1, Dec 04 '18 at 23:43
It also eliminates a dependency if you distribute your application. — Shawn, Dec 04 '18 at 23:49

rptb1 · Answer 1 · 2018-12-06T16:25:26.557

This can be solved by using rowids to pair up the results of a join. Window functions aren't necessary. (Thanks to xQbert for pointing me in this direction.)

First, we sort the two tables by v to make tables with rowids in a suitable order for the join.

CREATE TEMPORARY TABLE Xv AS SELECT * FROM X ORDER BY v;
CREATE TEMPORARY TABLE Yv AS SELECT * FROM Y ORDER BY v;

Then we can pick out the minimum rowid for each value of v in order to create a "zip join" for that value, pairing up the rows.

SELECT i, Yv.k, Xv.v
  FROM Xv JOIN Yv USING (v)
       JOIN (SELECT v, min(Xv.rowid) AS r FROM Xv GROUP BY v) AS xmin USING (v)
       JOIN (SELECT v, min(Yv.rowid) AS r FROM Yv GROUP BY v) AS ymin
         ON ymin.v = Xv.v AND Xv.rowid - xmin.r = Yv.rowid - ymin.r;

The clause Xv.rowid - min.x = Yv.rowid - min.y is the trick: it does a pairwise match of rows with the same value of v, essentially allocating one to the other. The result:

i           k           v         
----------  ----------  ----------
q           0           a         
s           2           a         
p           1           b

It's then a simple matter to use the result of this query in an UPDATE.

WITH changes AS (<the SELECT above>)
   UPDATE X SET k = (SELECT k FROM changes WHERE i = X.i)
    WHERE i IN (SELECT i FROM changes);

The temporary tables could be restricted to the common values of v and possibly indexed on v if the query is large.

I'd welcome refinements (or bugs!)

A join that "allocates" from available rows

1 Answers1