0

I have various strings in a table in SQL Server/Oracle db, like this:

 String1  
 String2  
 String3             

I need to convert them in to positive integers say some thing like:

String1 = 12  
String2 = 25  
String3 = 45    

I am not expecting the numbers in any order. Only requirement is number generated for one string should not conflict with other. Either new strings can be inserted into the table or old strings can be deleted. Now later if I do not have 2nd string instead there is a new string

String1 = 12    
String3 = 45    
String4 = 28    

Note that the number 25 generated for 2nd string earlier can not be used for the new string. Using extra storage (temp tables) is not allowed. I used row_number function but later on realized that the integers generated with this function will get changed when new strings are inserted (as order of strings will change)

Note: My strings are 128 bit GUIDs.

Note: If I am allowed to use a table (Unique Integer - GUID mapping) at my local server (not at client's machine), how can I make sure that new GUID strings are not allocated the integer assigned earlier ?

Any inputs to solve this problem will be appreciated.

Anks
  • 11
  • 6
  • How about Identity colunmn? – Teju MB Jun 24 '15 at 03:03
  • Any changes in database are not allowed. Database has a column with unique strings. I need to use SELECT query to retrieve those strings and assign unique integer to them. Number assigned to a string during 1st run should be same during second run too (considering string insertion and deletion can take place from db). No temp table creation allowed. – Anks Jun 24 '15 at 03:07
  • You're being asked to (1) store information, (2) without using storage. I don't think that's logically possible. – Mike K Jun 24 '15 at 03:11
  • May i know how your integers have been assigned to the string? – Teju MB Jun 24 '15 at 03:12
  • SELECT ROW_NUMBER() OVER (ORDER BY UniqueString column) as IntegerID FROM table x – Anks Jun 24 '15 at 03:14
  • The above query would work if no insertions/deletions are taking place. – Anks Jun 24 '15 at 03:17
  • I am wondering if there is any function that could produce integer for a string (like hash). So unless the strings are same, hash will be same but if strings get changed, new hash value will be generated. – Anks Jun 24 '15 at 03:19
  • @Anks: a hashing function could work, but only so far. See http://stackoverflow.com/questions/16521148/string-to-unique-integer-hashing for a more detailed explanation. – Mike K Jun 24 '15 at 03:22
  • @Anks guids are randomly generated, no pure function producing an int (if you mean 32 or 64 bit ints) can guarantee there will be no repeating values. – 1010 Jun 25 '15 at 04:11
  • 1
    Please explain *why* you need numbers. It can't be for uniqueness, because you could enforce that with GUIDs. – APC Jun 25 '15 at 05:37
  • I need numbers because my database design accepts int as a primary key. if I change primary key to be varchar, I'll have to test the affects of this modification in numerous tables (~200). The workaround is to generate integer from string and use the same database model. – Anks Jun 25 '15 at 17:57

2 Answers2

1

I prefer two approach

  1. Using Identity column .

  2. To store your latest running number into the another table and when ever you insert new number, just increase the count by 1. And for new strings you can take this new running number hence maintaining the UNIQUE constraint even if the previous value got deleted.

Edit: After getting your requirement i do not think as the above step will suits you. This is unrealistic to me without any changes in database structure. Will surely look forward if u find a way out.

Teju MB
  • 1,333
  • 5
  • 20
  • 37
  • I think it fits OP's request (since his latest edit). Recording a distinct int for each unique guid until there are no more ints is the only way I can think of mapping used guids to integers. This table should not allow deletes to preserve the mapping. – 1010 Jun 25 '15 at 03:56
0

Updated

After knowing that we are dealing with 1 million records, would suggest ROWID instead of hashing.

SQL> create table temp as
select 'String1' s from dual union
select 'String2' s from dual union
select 'String3' s from dual;

Table created.

SQL> with t as (
  select s, dump(rowid) dump from temp
)
select s, rowid, replace(substr(dump,instr(dump,': ')+2),',') from t;

S       ROWID              REPLACE(SUBSTR(DUMP,INSTR(DUMP,':')+2),',')
------- ------------------ ----------------------------------------------------------
String1 AAAl4EAAEAAACUrAAA 0294410374300
String2 AAAl4EAAEAAACUrAAB 0294410374301
String3 AAAl4EAAEAAACUrAAC 0294410374302

Previous

Sounds like ORA_HASH is the most suitable. https://docs.oracle.com/cd/B12037_01/server.101/b10759/functions097.htm

SQL> with t as (
  select 'String1' s from dual union
  select 'String2' s from dual union
  select 'String3' s from dual
)
select t.s, ora_hash(t.s) from t;

S       ORA_HASH(T.S)
------- -------------
String1    1383541721
String2    1227763854
String3    3134930256

Other alternatives listed below, but they need additional step to convert alphabets to numeric.

HG Tok
  • 73
  • 5
  • Thanks Tok. My strings are 128 bit GUID's. If I'll use hash function to generate 32 bit decimal from these GUID's, there are chances of collision. I don't need to convert integers back to string. I am thinking of a way to split 128 bit GUID to 4 strings of 32 bit each, generate integer of each split string and then combine those integers in a way to generate unique id. Is there an algorithm that guarantees uniqueness for this process ? – Anks Jun 24 '15 at 04:16
  • Is it feasible if i use a java function that generates unique integer from 128 bit GUID strings and then compile this function as SQL Server/Oracle stored procedure ? – Anks Jun 24 '15 at 17:47
  • Also, the maximum number of strings for which I need unique integer are only in the range of 1 million. Can I use hash function safely if I know the count of strings will never exceed 1 million. – Anks Jun 24 '15 at 17:51
  • Generally, hashing will have a small chance of collision happening. http://preshing.com/20110504/hash-collision-probabilities/ Would suggest to use ROWID approach instead since it give you a distinct value within the same table. (Take note of the scenarios it changes) – HG Tok Jun 25 '15 at 02:08