2

i have two tables which doesn't have a common key i wanted to do a join on these two tables without a cartesian join. Table 1 had around 40,000 ( record count varies in every day production ) rows while table2 present count is 80,000 rows ( record count varies in every day production ).

TABLE1 :- NAME_VALUES

NAME_VAL
--------
TOM
DICK
HARRY 

TABLE2 :- CUS_TABLE

CUS_ID
---------
401795480  
201134211 
137643082 
876450821 
777290153 
111035791 
579865552 

I wanted to have some thing like below as output

401795480 TOM
201134211 DICK
137643082 HARRY
876450821 DICK
777290153 HARRY
111035791 TOM
579865552 DICK

My idea was to assign rownumber for each table . For table 2 i would like to restart the rownumber once the maximum count of table 1 is reached some thing like below but unable to figure out how can i perform

Table1
    NAME_VAL   TABLE1_RN 
    --------------------- 
    TOM          1
    DICK         2
    HARRY        3

Table2
    CUS_ID    TABLE2_RN
    --------------------
    401795480 1 
    201134211 2
    137643082 3
    876450821 1
    777290153 2
    111035791 3
    579865552 1 

Now i have a key and can easily map to fetch the details i require.

Please suggest if any method can satisfy my requirement.

davejal
  • 6,009
  • 10
  • 39
  • 82
mikey
  • 37
  • 8
  • This is not a programming problem. You could ask this question at another stack member site [dba](dba.stackexchange.com). – davejal Jan 10 '16 at 11:32
  • @davejal: Why not? SQL is a turing complete programming language – Lukas Eder Jan 10 '16 at 11:36
  • 1
    Probable what you need is the ntile analytic function? https://docs.oracle.com/cd/B19306_01/server.102/b14200/functions101.htm or just dbms_random.value() function: https://docs.oracle.com/cd/B28359_01/appdev.111/b28419/d_random.htm#i998095 . Depending on what you want to achieve. – stee1rat Jan 10 '16 at 11:38
  • i'm new to this forum and i wanted this to be performed via SQL not an other programming language like PL/SQL hence posted here in this forum. – mikey Jan 10 '16 at 11:38
  • Yes @LukasEder, we could have a discussion about that, but still it is better asked at dba, as it's clearly database (sql) related. See the [topics](http://stackoverflow.com/help/on-topic) of SO and the topics of [dba](http://dba.stackexchange.com/help/on-topic) – davejal Jan 10 '16 at 11:39
  • @davejal: We could, of course, discuss whether [sql](http://stackoverflow.com/questions/tagged/sql) with 300k questions thus far is on topic. But if you want to be sure, why not just take this discussion to http://meta.stackexchange.com – Lukas Eder Jan 10 '16 at 11:43
  • @Lukas Eder . Thanks for the quick reply , the query is working fine but it took 1 minute just to give 100 records. is there a way to tune this query. – mikey Jan 10 '16 at 11:52
  • @mikey: You can also comment directly on my answer... What constraints / indexes do your tables have? – Lukas Eder Jan 10 '16 at 11:58
  • [Related question](http://stackoverflow.com/q/17725891/521799) – Lukas Eder Jan 10 '16 at 12:12

2 Answers2

4

Canonical (possibly slow) solution showing the idea

Here's a rather slow solution using the modulo operator on your join predicate:

SELECT cus_id, name_val
FROM (
  SELECT cus_id, ROWNUM - 1 rn
  FROM cus_table
) c
JOIN (
  SELECT name_val, ROWNUM - 1 rn, MAX(ROWNUM) OVER() total
  FROM name_values
) n
ON n.rn = MOD(c.rn, n.total)
ORDER BY c.rn

The above yields

CUS_ID      NAME_VAL
--------------------
401795480   TOM
201134211   DICK
137643082   HARRY
876450821   TOM
777290153   DICK
111035791   HARRY
579865552   TOM

SQLFiddle here

Faster solutions using SQL

In order to speed up the above, you have several options, including creating materialized views for c and n, or pre-calculating the values for n.rn - 1 and MOD(c.rn - 1, n.total) in the source tables, while putting indexes on these pre-calculated tables.

Faster solution using PL/SQL

If you are allowed to write PL/SQL in your system, you could obviously resort to a PIPELINED function for this:

CREATE TYPE rec AS OBJECT (
  cus_id NUMBER(18),
  name_val VARCHAR2(50)
);
/

CREATE TYPE tab AS TABLE OF rec;
/

CREATE OR REPLACE FUNCTION f RETURN tab PIPELINED AS
  TYPE name_vals IS TABLE OF name_values.name_val%type;
  v_name_vals name_vals;
BEGIN
  SELECT name_val
  BULK COLLECT INTO v_name_vals
  FROM name_values;

  FOR cus IN (SELECT cus_id, ROWNUM rn FROM cus_table)
  LOOP
    PIPE ROW(rec(cus.cus_id, v_name_vals(MOD(cus.rn - 1, v_name_vals.count) + 1)));
  END LOOP;
  RETURN;
END;
/

And then use that function as follows:

SELECT * FROM TABLE(f);
Lukas Eder
  • 211,314
  • 129
  • 689
  • 1,509
  • @mikey: Slightly faster version using `ROWNUM` rather than window functions. Will think more about tuning... – Lukas Eder Jan 10 '16 at 11:55
  • there are no indexes on any of these tables this is how table was designed by the developers ad designers – mikey Jan 10 '16 at 12:03
  • I don't know enough about your system, but materialized views would certainly accelerate your query (although they would slow down DML on the `cus_table` and `name_values` tables). Also, I suspect your statistics aren't correct on your source tables, when your query runs that slowly. Is it true that the table is filled (and emptied again) very often? – Lukas Eder Jan 10 '16 at 12:10
  • The table is not emptied it is daily updated with the customers that had changes ( delta updates) . I had ran this SQL in my production and given the timing. – mikey Jan 10 '16 at 12:13
  • i had tried using materialized which had given 1000 records per minute which is still slow considering that we had around 80k . Is there a way i can achieve the same using SQL instead of PL/SQl ? – mikey Jan 10 '16 at 12:27
  • Did you put indexes on the `rn` columns in the materialized views? Also, what's wrong with the PL/SQL based solution? Btw: The solution is certainly slow, but probably not *that* slow. I do think you have some statistics problem, too. Hard to debug from here without the full picture. – Lukas Eder Jan 10 '16 at 12:30
  • my client doesnt want any PL/SQL based solution hence i couldn't do much in that as he is paying out the salary. i hadn't put any index on RN below is the sql i used – mikey Jan 10 '16 at 12:35
  • with table1 as ( SELECT cus_id, row_number() OVER(ORDER BY 1) rn FROM cus ) , table2 as (SELECT FIRST_NAME, MIDDLE_NAME, LAST_NAME , row_number() OVER(ORDER BY 1) rn, count(*) OVER() total FROM REF_NAME ) select cus_id , FIRST_NAME , table1.rn , table2.total , table2.rn - 1 ,MOD(table1.rn - 1, table2.total) from table1 , table2 where table2.rn - 1 = MOD(table1.rn - 1, table2.total) ; – mikey Jan 10 '16 at 12:36
  • @mikey: This will be hard to continue in the comments. If you have a specific question about how to speed this up, please ask a new question (with much more details about your system constraints). As far as your current question is concerned, I think my answer is about as good as it gets. – Lukas Eder Jan 10 '16 at 12:49
0

Using SQL Model clause, which doesn't require a join operation.

select cus_id, name_val
  from (select cus_id,
               mod(row_number() over (order by 1) - 1, (select count(*) cnt from name_values)) + 1 rn
          from cus_table)
 model
   reference nv on (select name_val, rownum rn from name_values) dimension by (rn) measures (name_val)
   main cus
   dimension by (cus_id, rn)
   measures (cast((null) as varchar2(4000)) as name_val)
   rules upsert all
 (
   name_val[any, any] = nv.name_val[cv(rn)]
 );

And a sample execution:

SQL> WITH
  2    name_values (name_val) AS (
  3      SELECT 'TOM'   FROM dual UNION ALL
  4      SELECT 'DICK'  FROM dual UNION ALL
  5      SELECT 'HARRY' FROM dual
  6    ),
  7    cus_table (cus_id) AS (
  8      SELECT 401795480 FROM dual UNION ALL
  9      SELECT 201134211 FROM dual UNION ALL
 10      SELECT 137643082 FROM dual UNION ALL
 11      SELECT 876450821 FROM dual UNION ALL
 12      SELECT 777290153 FROM dual UNION ALL
 13      SELECT 111035791 FROM dual UNION ALL
 14      SELECT 579865552 FROM dual
 15    )
 16  select cus_id, name_val
 17    from (select cus_id,
 18                 mod(row_number() over (order by 1) - 1, (select count(*) cnt from name_values)) + 1 rn
 19            from cus_table)
 20   model
 21     reference nv on (select name_val, rownum rn from name_values) dimension by (rn) measures (name_val)
 22     main cus
 23     dimension by (cus_id, rn)
 24     measures (cast((null) as varchar2(4000)) as name_val)
 25     rules upsert all
 26   (
 27     name_val[any, any] = nv.name_val[cv(rn)]
 28   );
    CUS_ID NAME_VAL
---------- --------------------
 401795480 TOM
 876450821 TOM
 579865552 TOM
 201134211 DICK
 777290153 DICK
 137643082 HARRY
 111035791 HARRY
7 rows selected
Francisco Sitja
  • 963
  • 4
  • 7