Query to assign the same ID to row being inserted if it it already exists in the table

Question

I am inserting customer records into a table where, if a record with the same name already exists, I assign the same ID to the newly inserted record.

Assume table T has this record:

ID | Name | Phone_Number | Date_Inserted
105| Sam  | 111111       | 04/03/2014
106| Rita | 222222       |04/03/2014

And I'm inserting this from table A:

Name| Phone_Number
Sam | 333333

Then after insertion, table T should have:

ID | Name | Phone_Number | Date_Inserted
105| Sam  | 111111       | 04/03/2014
106| Rita | 222222       | 04/03/2014
105| Sam  | 333333       | 04/04/2014

Without the above change it would look like:

INSERT INTO T SELECT CustID.nextval,Name,Phone_Number,SYSDATE FROM A;

I was thinking of using,

INSERT INTO T
  SELECT CASE 
           WHEN NOT EXISTS(select null from T WHERE T.Name=A.Name) THEN CustID.nextVal
           ELSE (select ID from T where T.Name=A.Name) 
         END, 
         Name,
         Phone_Number,
         SYSDATE 
   FROM A;

But I'm not sure if it'll work and it seems redundant/bad for performance. If there's a preferred way to do this, please let me know.

Do you really want 2 records with the same primary ID in a table? Perhaps a sequence number would help to differentiate records. — Twelfth, Jul 31 '14 at 17:34
What is your primary key on table T and how can we know its the same Sam? — Jack M., Jul 31 '14 at 18:31
^The above was just a simplification. We can assume in this case that the Name column always identifies a person. I forgot to add the primary key here but it's another sequence. — user3808188, Jul 31 '14 at 18:52

Jeff N · Answer 1 · 2014-08-08T18:09:00.063

8

If your schema is not set in stone, I would perhaps reconfigure it so that there is a "person" table and a separate "person phone number" table. With that sort of set up, you can associate multiple phone numbers with one person, and you won't be stomping on IDs, or creating confusing secondary ID columns that aren't primary keys.

edited Aug 08 '14 at 18:09

answered Jul 31 '14 at 18:40

Jeff N

430
3
5

The above was a simplification of my schema, there are actually about 10 other columns that are imported as well. – user3808188 Jul 31 '14 at 18:50
@OP: Well, then simplify your schema as soon as you can. Working on a denormalized model like this will only increase your workload. A poor man pays twice and a poor programmer codes many times. – Rachcha Aug 08 '14 at 06:33
This is a data warehouse so honestly denormalization is expected to a degree. I agree with you that it isn't ideal but this is the schema I'm expected to follow. – user3808188 Aug 08 '14 at 15:20

kinjelom · Answer 2 · 2014-08-07T09:11:59.060

If table A is small:

insert into T (id, name, phone_number, date_inserted)
select 
    nvl((select max(id) from T where T.name=A.name ), custid.nextval), 
    a.name, a.phone_number, sysdate 
from A

If table A is big:

insert into T (id, name, phone_number, date_inserted)
select nvl(DT.id,custid.nextval), a.name, a.phone_number, sysdate 
from A 
left outer join (
    select max(id) id, name from T 
    where name in (select distinct name from A)
    group by T.name
) DT 
on A.name=DT.name

If you want to "move" rows from table A to table T:

begin
  for row in (select name, phone_number, rowid rid from A) loop
     insert into T (id, name, phone_number, date_inserted) 
     select 
        nvl((select max(id) from T where T.name=row.name ), custid.nextval), 
        row.name, row.phone_number, sysdate
     from dual; 
     delete from A where rowid=row.rid;
  end loop;
end; 
/

score 1 · Accepted Answer · edited Jun 20 '20 at 09:12

The characterisation of anything as "bad" is subjective. As long as the results are correct, something is only "bad" if it takes too long or uses too many system resources. You define "long" and "too many". If something is returning the correct results, in an acceptable time, using an acceptable amount of system resources then there is no need to change .

There are, however, a number of things that you can look at out for (assuming that altering your data-model is not an acceptable solution):

You're going to want an index on NAME, ID as you're selecting on NAME and returning ID.
Your second correlated sub-query, (select ID from T where T.Name=A.Name), is returning multiple rows, which is going to cause an error. You either need to limit the result set to a single row, or to utilise some aggregate function. It seems better to add an additional condition where rownum < 2 to limit the results as adding an aggregate will force Oracle to perform a range scan over every row that has that name whereas you only need to find whether it exists.
CASE claims that it performs short-circuit evaluation; this isn't necessarily true when you get sequences involved.
I don't think it will affect your INSERT statement but it might be worth changing your DATE_INSERTED column to have a default; it means that you don't need to add it to every query and you can't forget to do so:
```
 alter table t modify date_inserted date default sysdate;
```

Putting these (pretty small) changes together your query might look like:

insert into t (id, name, phone_number)
select coalesce( select id from t where name = a.name and rownum < 2
               , custid.nextval
                 )
     , name
     , phone_number
  from a

Only you can tell whether this is acceptable or not.

I do something very similar - For one analytical database I have to maintain an old data-based primary key. The only way I could get the thing to work was running it in a background job every minute, using correlated sub-queries and explicitly adding a rownum restriction on the number of potential rows. I know that it's "better" to maintain this in the INSERT statement but the execution time was unacceptable. I know that the code can only deal with at most 10,000 rows a minute but it doesn't matter as I only add at most 5,000 rows a minute to the table. These numbers might change in the future and as the table grows the execution plan might change as well - when it does I'll deal with the problem then rather than attempting to solve a problem that doesn't exist.

tl;dr

Every bit of code is okay until it isn't. While knowledge and experience can help code to remain okay for longer don't prematurely optimise if there's no need to optimise.

score 1 · Answer 4 · answered Aug 03 '14 at 12:52

Your version of the insert query will generate an error for the third and subsequent rows. I agree with @JeffN that you should fix the schema, because you clearly have a "person" entity and a "telephone" entity. But, given that you don't want to do that, the query you want is:

INSERT INTO T(id, name, phone_number, date_inserted)
   SELECT (CASE WHEN oldid is null THEN CustID.nextVal
                ELSE oldid 
           END) as Id, Name, Phone_Number, SYSDATE 
   FROM (select a.*, (select max(id) from T where T.Name = A.Name) as OldId
         from A
        ) a;

For the purposes of this query, you should create an index on T(Name, Id):

create index idx_t_name_id on t(name, id);

You could also wrap this in a before insert trigger. I usually use a before insert trigger for auto-incrementing column in older versions of Oracle, rather than putting the sequence values in explicitly.

score 1 · Answer 5 · answered Aug 08 '14 at 01:05

I'd probably use an inline view to get distinct id,name and then outer join to it.

INSERT INTO T
(   id
,   name
,   phone_number
,   date_inserted
)
SELECT  NVL( TVW.id, CustID.nextval )
    ,   A.name
    ,   A.phone_number
    ,   SYSDATE
FROM    A
    ,   (   SELECT DISTINCT id, name
            FROM   T
        ) TVW
WHERE   A.name = TVW.name (+)

score 1 · Answer 6 · answered Aug 08 '14 at 06:24

i will suggest this kind of procedure.

create or replace procedure insert_id(name_in     in varchar2,
                                  phone_in    in number,
                                  date_ins_in date default sysdate) is
 cursor names is
 select id, name from names;
 type id is table of names.id%type;
 type name is table of names.name%type;
 sql_text varchar2(4000);
 r_ct     pls_integer;
 l_id     id;
 l_name   name;

begin

   open names;
   fetch names bulk collect
   into l_id, l_name;
   close names;

r_ct := 0;

for i in l_id.first .. l_id.last

 loop

  if l_name(i) = name_in then
  sql_text :=    q'{insert into names values(}' || q'{'}' 
                                                || l_id(i) 
                                                ||q'{'}' 
                                                || ',' 
                                                || q'{'}'
                                                || name_in 
                                                || q'{'}' 
                                                || ',' 
                                                ||q'{'}' 
                                                || phone_in 
                                                || q'{'}'
                                                || ',' 
                                                || q'{'}' 
                                                ||date_ins_in 
                                                || q'{'}'
                                                || ')';

    execute immediate sql_text;
    r_ct := sql%rowcount;
    commit;
    exit;
   end if;
 end loop;

 if r_ct != 1 then

for i in l_id.first .. l_id.last 
 loop
  if l_name(i) != name_in then
    sql_text := 'insert into names values(' || q'{'}' 
                                            || CustID.nextval --this part may be wrong, i guess it will be easy to correct, if something's wrong
                                            || q'{'}' 
                                            || ',' 
                                            || q'{'}' 
                                            || name_in
                                            || q'{'}' 
                                            || ',' 
                                            || q'{'}' 
                                            || phone_in 
                                            || q'{'}' 
                                            || ',' 
                                            || q'{'}' 
                                            || date_ins_in 
                                            || q'{'}'
                                            || ')';

execute immediate sql_text;
commit;
exit;
end if;
end loop;
end if;
end;

Bacs · Answer 7 · 2014-08-08T14:55:08.493

I agree with the suggestions about splitting the table if you can. Your ID column really looks like it ought to be a foreign key, joining one person record to multiple phone numbers.

Assuming you can't change the tables, is it possible that there will be duplicate names in table A which don't yet appear in table T? If so, you'll need to write some PL/SQL and process the records one at a time. For example, if A contains ...

Name| Phone_Number
Sam | 333333
Tom | 444444
Tom | 555555

... you won't be able to process the records in a single insert, because Tom's ID won't be available in table T. Tom will end up with two IDs in table T.

Given the sample data you provided, your insert will work. My version below will do exactly the same thing and ought to be slightly more efficient. Notice that the nextval of your sequence will be evaluated whether or not it's used, so you'll find that sequence numbers are skipped wherever an ID from table t is reused. If that's a problem, you're likely looking at writing a bit of PL/SQL.

insert into t 
            (id
            ,name
            ,phone_number
            ,date_inserted)
   select    nvl(t.id,CustID.nextval)
            ,a.name
            ,a.phone_number
            ,sysdate
   from a
   left join t on a.name = t.name;

score 0 · Answer 8 · answered Aug 09 '14 at 19:20

I mostly use mysql, so not sure about the oracle syntax but logically we can achieve this using a if statement and a sub-query. Something like this :

INSERT INTO T SELECT (CASE WHEN (SELECT COUNT(ID) FROM T WHERE Name=A.Name) > 0 THEN (SELECT ID FROM T WHERE Name=A.Name where ROWNUM <= 1) ELSE CustID.nextval END),Name,Phone_Number,SYSDATE FROM A;

Again, I'm not a Oracle programmer so syntax could be incorrect.

Query to assign the same ID to row being inserted if it it already exists in the table

8 Answers8

tl;dr