4

Am migrating MongoDB to postgres.

Thanks to postgres JSON , which makes it easy to move nested documents as JSON. But the problem starts in migrating the "ID" field.

Mongo generates a large hexadecimal number 56c4100560b2d8308f4bde21

I tried converting this to BigInt and unfortunately it is out of range 26397231623443762880753094891

  1. The ID cannot be re-generated as its been interlinked between the documents everywhere.

  2. String cannot be used as ID field, as am moving to Postgres + JPA ,and I will be using auto generate sequence moving forward.

Is there any way, I can cut short this hexadecimal to a shorter version of Int or BigInt , at the same time, I should maintain the uuiqueness

I tried taking modulus, but it brings in repetition

madhairsilence
  • 3,787
  • 2
  • 35
  • 76
  • you could use the same hex as bytea I suppose – Vao Tsun Mar 27 '17 at 07:18
  • ID should int/bigint. Unfortunately I cannot change it – madhairsilence Mar 27 '17 at 07:58
  • then you probably will need some mapping. Like create a bigint sequence to use as ID and migrating data save map old_hex -> new_int somewhere – Vao Tsun Mar 27 '17 at 08:05
  • 1
    Usually, these fields are migrated to be a [`uuid` field](https://www.postgresql.org/docs/current/static/datatype-uuid.html) (this is usually well supported in JPA implementations: e.g. you can generate random UUIDs more than one way) -- when you want to use the original value as a `bigint` f.ex. (and want to use an auto-incrementing value onwards) there is a problem: if a small value appears in your existing rows, your solution will start rejecting some `INSERT` upon reaching it. – pozs Mar 27 '17 at 08:12
  • See f.ex. http://stackoverflow.com/q/41502698/1499698 and http://stackoverflow.com/q/244243/1499698 – pozs Mar 27 '17 at 08:16

1 Answers1

4

Your best option is to migrate MongoDB's ObjectId fields to PostgreSQL's uuid columns. Note that UUID has more bytes in it, so you'll need to pad the values.

See more on that:

If you really want to use bigints, you have two options:

1. Create completely new values

  1. create your schema (with tables, constraints, etc.)
    • in this schema, use text / varchar for your ObjectId values (for now)
  2. create foreign keys for all your relations, with ON UPDATE CASCADE for all ObjectId columns.
  3. create sequences for all tables, which have ObjectId columns.
  4. update ObjectId columns (while they are still text / varchar), with:

    UPDATE table_name
    SET    object_id_col = nextval('table_name_object_id_col_seq')::text
    

    (This will propagate the changes to referencing tables, because foreign keys were set up earlier.)

  5. drop foreign keys
  6. change your column types of these ObjectId columns to bigint
  7. alter your sequences to be OWNED BY the table column
  8. alter tables to use nextval('table_name_object_id_col_seq') as default values
  9. re-add foreign keys

This method is guaranteed to never cause duplicate values during migration. And the sequence can be used to create new values for the primary keys.

2. Use your original values in some way

Truncation will cause information loss, so you may end up with duplicated values, no matter what method you will try. However, you can reduce the chance of this by using f.ex. bitwise XOR (usually the # operator in PostgreSQL) instead of modulus.

With this function f.ex. you can use your original values as:

  1. start with 0 (or with some other, fix starting value)
  2. with each iteration, use N number of the least significant bits from input
  3. calculate result as <the_previous_result> # <value_from_2.>
  4. continue at 2. when there is more, unused bits (input should be the old input but the N least significant bits)

Here is an SQL function, which does that:

create or replace function hex_xor(p_hex text, p_bits int default 64, p_default bigint default 0)
  returns bigint
  language sql
  immutable
as $func$
  with recursive r as (
      select ('x' || p_hex)::varbit h, p_default r, 0 i
    union all
      select case
               when bit_length(h) <= p_bits then varbit ''
               else substring(h for bit_length(h) - p_bits)
             end,
             r # case
               when bit_length(h) <= p_bits then h::bit(64)::bigint
               else substring(h from bit_length(h) - p_bits + 1 for p_bits)::bigint
             end,
             i + 1
      from   r
      where  bit_length(h) > 0
  )
  select   r
  from     r
  order by i desc
  limit    1
$func$;

This assumes that the p_hex parameter is really in hex format & the p_bits parameter is never greater than 64.

But if you just use this as is, you may end up later, with conflicting values upon INSERT. What you can do is f.ex. to use:

select -hex_xor('56c4100560b2d8308f4bde21', 63)

upon migration. This way migrated ObjectIds will always be negative values & later generated primary keys (f.ex. from a sequence) will always be positive.

http://rextester.com/RCVFN77368

Community
  • 1
  • 1
pozs
  • 34,608
  • 5
  • 57
  • 63