4

I have almost the same problem as described in this question. I have two tables with the same number of rows, and I would like to join them together one by one.

The tables are ordered, and I would like to keep this order after the join, if it is possible.

There is a rowid based solution for MSSql, but in SQLite rowid can not be used if the table is coming from a WITH statement (or RECURSIVE WITH).

It is guaranteed that the two tables have the exact same number of rows, but this number is not known beforehand. It is also important to note, that the same element may occur more than twice. The results are ordered, but none of the columns are unique.

Example code:

WITH
table_a (n) AS (
  SELECT 2
  UNION ALL
  SELECT 4
  UNION ALL
  SELECT 5
),
table_b (s) AS (
  SELECT 'valuex'
  UNION ALL
  SELECT 'valuey'
  UNION ALL
  SELECT 'valuez'
)
SELECT table_a.n, table_b.s
FROM table_a
LEFT JOIN table_b ON ( table_a.rowid = table_b.rowid )

The result I would like to achieve is:

(2, 'valuex'),
(4, 'valuey'),
(5, 'valuez')

SQLFiddle: http://sqlfiddle.com/#!5/9eecb7/6888

Iter Ator
  • 8,226
  • 20
  • 73
  • 164
  • 2
    Re: "The results are ordered, but none of the columns are unique." - not sure if you're already aware of all this but in SQL if no `ORDER BY` is explicitly defined then the order in which rows come back isn't defined. And if an `ORDER BY` *is* specified but there are values that are the same between different rows then the order in which those rows come back isn't defined. So given this, the way in which the two tables are joined wouldn't be predictable/deterministic. – Steve Chambers May 09 '18 at 09:43
  • (My take on the above comment:) What does "The tables are ordered" mean? Tables are not ordered, result sets [sic] are. There is no order to "keep". Even if there is an implementation order, it is not accessible via queries. Please clarify. Beware, you seem to have some misconceptions. (Eg re random(), operators are not guranteed to be called more than once per query on given arguments.) – philipxy May 10 '18 at 03:04

8 Answers8

1

Since the tables are ordered, you can add row_id values by comparing n values.

But still the best way in order to get better performance would be inserting the ID values while creating the tables.

http://sqlfiddle.com/#!5/9eecb7/7014

WITH
table_a_a (n, id) AS 
(
  WITH table_a (n) AS 
  (
  SELECT 2
  UNION ALL
  SELECT 4
  UNION ALL
  SELECT 5
  )
SELECT table_a.n, (select count(1) from table_a b where b.n <= table_a.n) id
FROM table_a
) ,
table_b_b (n, id) AS 
(
  WITH table_a (n) AS 
  (
   SELECT 'valuex'
  UNION ALL
  SELECT 'valuey'
  UNION ALL
  SELECT 'valuez'
  )
SELECT table_a.n, (select count(1) from table_a b where b.n <= table_a.n) id
FROM table_a
) 
select table_a_a.n,table_b_b.n  from table_a_a,table_b_b where table_a_a.ID = table_b_b.ID

or convert the input set to comma separated list and try like this:

http://sqlfiddle.com/#!5/9eecb7/7337

WITH RECURSIVE  table_b( id,element, remainder ) AS (
            SELECT 0,NULL AS element, 'valuex,valuey,valuz,valuz' AS remainder
                UNION ALL
            SELECT id+1,
                CASE
                    WHEN INSTR( remainder, ',' )>0 THEN 
                        SUBSTR( remainder, 0, INSTR( remainder, ',' ) )
                    ELSE
                        remainder
                END AS element,
                CASE
                    WHEN INSTR( remainder, ',' )>0 THEN 
                        SUBSTR( remainder, INSTR( remainder, ',' )+1 )
                    ELSE
                        NULL
                END AS remainder
            FROM table_b
            WHERE remainder IS NOT NULL
        ),
          table_a( id,element, remainder ) AS (
            SELECT 0,NULL AS element, '2,4,5,7' AS remainder
                UNION ALL
            SELECT id+1,
                CASE
                    WHEN INSTR( remainder, ',' )>0 THEN 
                        SUBSTR( remainder, 0, INSTR( remainder, ',' ) )
                    ELSE
                        remainder
                END AS element,
                CASE
                    WHEN INSTR( remainder, ',' )>0 THEN 
                        SUBSTR( remainder, INSTR( remainder, ',' )+1 )
                    ELSE
                        NULL
                END AS remainder
            FROM table_a
            WHERE remainder IS NOT NULL
        )
         SELECT table_b.element, table_a.element FROM table_b, table_a WHERE table_a.element IS NOT NULL and table_a.id = table_b.id;
Pelin
  • 936
  • 5
  • 12
  • Thank you. Since the `n` and `s` fields are not unique, what if the same value occurs more than once? – Iter Ator May 08 '18 at 09:54
  • Good point, in this case I assume the both s and n values will occur more right? Check this fiddle is it the result set you need? http://sqlfiddle.com/#!5/9eecb7/7018 – Pelin May 08 '18 at 10:39
  • No, the only guarantee is that the two tables have the same number of rows. This example returns 3 rows instead of 4: http://sqlfiddle.com/#!5/9eecb7/7019 – Iter Ator May 08 '18 at 11:18
  • If it is convenient for you to convert your input list to a comma-separated list this fiddle should help you to achieve the result set you need: http://sqlfiddle.com/#!5/9eecb7/7337 I have added this solution to answer as well. – Pelin May 09 '18 at 08:50
  • What do you mean, "the tables are ordered"? What is your justification that whatever you means is what the question means? What is the justification summarized by "since" in "Since the tables are ordered, you can ..."? – philipxy May 10 '18 at 03:54
  • I meant that the table is sorted. Which part is not clear? – Pelin May 11 '18 at 08:00
  • SQL tables have no order. Result sets have order. There is no order for the question to ask to "keep". It does not explain what it means by order, so how can you answer? – philipxy May 13 '18 at 02:23
  • it is stated in the question. User needs to match items in these 2 table according to their values from lowest to highest... – Pelin May 14 '18 at 08:57
1

SQL

SELECT a1.n, b1.s
FROM table_a a1
LEFT JOIN table_b b1
ON (SELECT COUNT(*) FROM table_a a2 WHERE a2.n <= a1.n) =
   (SELECT COUNT(*) FROM table_b b2 WHERE b2.s <= b1.s)

Explanation

The query simply counts the number of rows up until the current one for each table (based on the ordering column) and joins on this value.

Demo

See SQL Fiddle demo.

Assumptions

  1. A single column in used for the ordering in each table. (But the query could easily be modified to allow multiple ordering columns).
  2. The ordering values in each table are unique.
  3. The values in the ordering column aren't necessarily the same between the two tables.
  4. It is known that table_a contains either the same or more rows than table_b. (If this isn't the case then a FULL OUTER JOIN would need to be emulated since SQLite doesn't provide one.)
  5. No further changes to the table structure are allowed. (If they are, it would be more efficient to have pre-populated columns for the ordering).
Steve Chambers
  • 37,270
  • 24
  • 156
  • 208
  • If the column `o` is not unique (it is still a valid ordering), and the same elements occurs more than one, this method fails: http://sqlfiddle.com/#!5/9eecb7/7343 – Iter Ator May 09 '18 at 09:23
  • Was assuming the values in an ordering column would be unique - have added this as an assumption. If this assumption is incorrect and there is no secondary ordering column then there is no guaranteed order that the rows would be joined - would you want the join to be random (but 1-1) in this case? – Steve Chambers May 09 '18 at 09:29
  • 1
    I tried to introduce a random column to for the cases when there are repetitions in the table, but I get a strange result, with NULL values: http://sqlfiddle.com/#!5/9eecb7/7394 – Iter Ator May 09 '18 at 10:57
  • 2
    The results of your query are different each time it is executed. I don't think `RANDOM()` behaves as you might expect when used in a CTE in SQLite - it seems to get re-evaluated when the table is joined. To demonstrate this, look at the results of `SELECT a1.r, b1.r FROM table_ar a1 CROSS JOIN table_br b1` - would expect some of these values to be the same but they are all different. – Steve Chambers May 09 '18 at 15:00
  • Until @IterAtor gives sufficient requirements/assumptions, answers are just going to promote their not doing so. – philipxy May 10 '18 at 03:05
  • 1
    @philipxy My understanding from above is that Iter Ator would like the rows that are joined to follow the order of the ordering column wherever possible. But in the case where the ordering column has a group of duplicates then the join should still be one-to-one with the other table but picking rows at random from within this group. – Steve Chambers May 10 '18 at 08:18
  • 1
    @IterAtor Re: the strange `RANDOM()` behaviour, I've done some further investigation and asked a new question here: https://stackoverflow.com/questions/50289527. – Steve Chambers May 11 '18 at 09:50
1

This is quite complicated in SQLite -- because you are allowing duplicates. But you can do it. Here is the idea:

  • Summarize the table by the values.
  • For each value, get the count and offset from the beginning of the values.
  • Then use a join to associate the values and figure out the overlap.
  • Finally use a recursive CTE to extract the values that you want.

The following code assumes that n and s are ordered -- as you specify in your question. However, it would work (with small modifications) if another column specified the ordering.

You will notice that I have included duplicates in the sample data:

WITH table_a (n) AS (
      SELECT 2 UNION ALL
      SELECT 4 UNION ALL
      SELECT 4 UNION ALL
      SELECT 4 UNION ALL
      SELECT 5
     ),
     table_b (s) AS (
      SELECT 'valuex' UNION ALL
      SELECT 'valuey' UNION ALL
      SELECT 'valuey' UNION ALL
      SELECT 'valuez' UNION ALL
      SELECT 'valuez'
     ),
     a as (
      select a.n, count(*) as a_cnt,
             (select count(*) from table_a a2 where a2.n < a.n) as a_offset
      from table_a a
      group by a.n
     ),
     b as (
      select b.s, count(*) as  b_cnt,
             (select count(*) from table_b b2 where b2.s < b.s) as b_offset
      from table_b b
      group by b.s
     ),
     ab as (
      select a.*, b.*,
             max(a.a_offset, b.b_offset) as offset,
             min(a.a_offset + a.a_cnt, b.b_offset + b.b_cnt) - max(a.a_offset, b.b_offset) as cnt
      from a join
           b
           on a.a_offset + a.a_cnt - 1 >= b.b_offset and
              a.a_offset <= b.b_offset + b.b_cnt - 1
     ),
      cte as (
      select n, s, offset, cnt, 1 as ind
      from ab
      union all
      select n, s, offset, cnt, ind + 1
      from cte
      where ind < cnt
     )
select n, s
from cte
order by n, s;

Here is a DB Fiddle showing the results.

I should note that this would be much simpler in almost any other database, using window functions (or perhaps variables in MySQL).

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
0

Either way...

Use something like

WITH
v_table_a (n, rowid) AS (
  SELECT 2, 1
  UNION ALL
  SELECT 4, 2
  UNION ALL
  SELECT 5, 3
),
v_table_b (s, rowid) AS (
  SELECT 'valuex', 1
  UNION ALL
  SELECT 'valuey', 2
  UNION ALL
  SELECT 'valuez', 3
)
SELECT v_table_a.n, v_table_b.s
FROM v_table_a
LEFT JOIN v_table_b ON ( v_table_a.rowid = v_table_b.rowid );

for "virtual" tables (with WITH or without),

WITH RECURSIVE vr_table_a (n, rowid) AS (
  VALUES (2, 1)
  UNION ALL
  SELECT n + 2, rowid + 1 FROM vr_table_a WHERE rowid < 3
)
, vr_table_b (s, rowid) AS (
  VALUES ('I', 1)
  UNION ALL
  SELECT s || 'I', rowid + 1 FROM vr_table_b WHERE rowid < 3
)
SELECT vr_table_a.n, vr_table_b.s
FROM vr_table_a
LEFT JOIN vr_table_b ON ( vr_table_a.rowid = vr_table_b.rowid );

for "virtual" tables using recursive WITHs (in this example the values are others then yours, but I guess you get the point) and

CREATE TABLE p_table_a (n INT);
INSERT INTO p_table_a VALUES (2), (4), (5);
CREATE TABLE p_table_b (s VARCHAR(6));
INSERT INTO p_table_b VALUES ('valuex'), ('valuey'), ('valuez');

SELECT p_table_a.n, p_table_b.s
FROM p_table_a
LEFT JOIN p_table_b ON ( p_table_a.rowid = p_table_b.rowid );

for physical tables.

I'd be careful with the last one though. A quick test shows, that the numbers of rowid are a) reused -- when some rows are deleted and others are inserted, the inserted rows get the rowids from the old rows (i.e. rowid in SQLite isn't unique past the lifetime of a row, whereas e.g. Oracle's rowid AFAIR is) -- and b) corresponds to the order of insertion. But I don't know and didn't find a clue in the documentation, if that's guaranteed or is subject to change in other/future implementations. Or maybe it's just a mere coincidence in my test environment.

(In general physical order of rows may be subject to change (even within the same database using the same DMBS as a result of some reorganization) and is therefore no good choice to rely on. And it's not guaranteed, a query will return the result ordered by physical position in the table as well (it might use the order of some index instead or have a partial result ordered some other way influencing the output's order). Consider designing your tables using common (sort) keys in corresponding rows for ordering and to join on.)

sticky bit
  • 36,626
  • 12
  • 31
  • 42
0

You can create temp tables to carry CTE data row. then JOIN them by sqlite row_id column.

CREATE TEMP TABLE temp_a(n integer);
CREATE TEMP TABLE temp_b(n VARCHAR(255));

WITH table_a(n) AS (
  SELECT 2 n
  UNION ALL
  SELECT 4
  UNION ALL
  SELECT 5
  UNION ALL
  SELECT 5
) 
INSERT INTO temp_a (n) SELECT n FROM table_a;

WITH table_b (n) AS 
(
  SELECT 'valuex'
  UNION ALL
  SELECT 'valuey'
  UNION ALL
  SELECT 'valuez'
  UNION ALL
  SELECT 'valuew'
)
INSERT INTO temp_b (n) SELECT n FROM table_b;

SELECT * 
FROM temp_a a 
INNER JOIN temp_b b on a.rowid = b.rowid;

sqlfiddle:http://sqlfiddle.com/#!5/9eecb7/7252

D-Shih
  • 44,943
  • 6
  • 31
  • 51
0

It is possible to use the rowid inside a with statement but you need to select it and make it available to the query using it. Something like this:

with tablea AS (
  select id, rowid AS rid from someids),
  tableb AS (
  select details, rowid AS rid from somedetails)
select tablea.id, tableb.details
from
    tablea
    left join tableb on tablea.rid = tableb.rid;

It is however as they have already warned you a really bad idea. What if the app breaks after inserting in one table but before the other one? What if you delete an old row? If you want to join two tables you need to specify the field to do so. There are so many things that could go wrong with this design. The most similar thing to this would be an incremental id field that you would save in the table and use in your application. Even simpler, make those into one table. Read this link for more information about the rowid: https://www.sqlite.org/lang_createtable.html#rowid

sqlfiddle: http://sqlfiddle.com/#!7/29fd8/1

jmriego
  • 335
  • 3
  • 15
0

It is possible to use the rowid inside a with statement but you need to select it and make it available to the query using it. Something like this:

with tablea AS (select id, rowid AS rid from someids),
  tableb AS (select details, rowid AS rid from somedetails)
select tablea.id, tableb.details
from
    tablea
    left join tableb on tablea.rid = tableb.rid;
buhtz
  • 10,774
  • 18
  • 76
  • 149
0

The problem statement indicates:

The tables are ordered

If this means that the ordering is defined by the ordering of the values in the UNION ALL statements, and if SQLite respects that ordering, then the following solution may be of interest because, apart from small tweaks to the last three lines of the sample program, it adds just two lines:

A(rid,n) AS (SELECT ROW_NUMBER() OVER ( ORDER BY 1 ) rid, n FROM table_a),
B(rid,s) AS (SELECT ROW_NUMBER() OVER ( ORDER BY 1 ) rid, s FROM table_b)

That is, table A is table_a augmented with a rowid, and similarly for table B.

Unfortunately, there is a caveat, though it might just be the result of my not having found the relevant specifications. Before delving into that, however, here is the full proposed solution:

WITH
table_a (n) AS (
  SELECT 2
  UNION ALL
  SELECT 4
  UNION ALL
  SELECT 5
),
table_b (s) AS (
  SELECT 'valuex'
  UNION ALL
  SELECT 'valuey'
  UNION ALL
  SELECT 'valuez'
),
A(rid,n) AS (SELECT ROW_NUMBER() OVER ( ORDER BY 1 ) rid, n FROM table_a),
B(rid,s) AS (SELECT ROW_NUMBER() OVER ( ORDER BY 1 ) rid, s FROM table_b)

SELECT A.n, B.s
FROM A LEFT JOIN B
ON ( A.rid = B.rid );

Caveat

The proposed solution has been tested against a variety of data sets using sqlite version 3.29.0, but whether or not it is, and will continue to be, "guaranteed" to work is unclear to me.

Of course, if SQLite offers no guarantees with respect to the ordering of the UNION ALL statements (that is, if the question is based on an incorrect assumption), then it would be interesting to see a well-founded reformulation.

peak
  • 105,803
  • 17
  • 152
  • 177