12

One of our software-projects uses a PostgreSQL-table with a column 'guid' of type bytea.

This is used with hibernate 3.3.2.GA with PostgreSQL 8.4, which serializes the java UUID type using java object serialization. The result is a value like the following escape format bytea literal:

'\254\355\000\005sr\000\016java.util.UUID\274\231\003\367\230m\205/\002\000\002‌​J\000\014leastSigBitsJ\000\013mostSigBitsxp\273\222)\360*r\322\262u\274\310\020\3‌​42\004M '

... which we cannot easily use in a query as select or condition to retrieve relevant rows.

Does anyone have a way to read or use the bytea-column in the select- or where-parts of a query (e.g. via psql or pgadmin3), without setting up some hibernate-query?

Craig Ringer
  • 307,061
  • 76
  • 688
  • 778
FvHovell
  • 171
  • 1
  • 1
  • 6

3 Answers3

10

Update: See edit to question, this answer applies to the commonplace 16-byte serializations of uuid; the question was amended to reflect java serialization.


Interesting problem. I landed up writing a simple C extension to do it efficiently, but it's probably more sensible to use the PL/Python version below.

Because uuid is a fixed sized type and bytea is varlena you can't just create cast ... as implicit to binary-coerce them, because the variable length field header would get in the way.

There's no built-in function for bytea input to return a uuid. It'd be a handy thing to have, but I don't think anyone's done it yet.

Simplest way

Update: There's actually a simple way to do this. bytea in hex form is actually a valid uuid literal once the \x is stripped off, because uuid_in accepts plain undecorated hex without - or {}. So just:

regress=> SET bytea_output = 'hex';
SET
regress=> SELECT CAST( substring(CAST (BYTEA '\x0FCC6350118D11E4A5597DE5338EB025' AS text) from 3) AS uuid);
              substring               
--------------------------------------
 0fcc6350-118d-11e4-a559-7de5338eb025
(1 row)

It involves a couple of string copies and a hex encode/decode cycle, but it'll be tons faster than any of the PL answers I suggested earlier, though slower than C.

Other options

Personally I recommend using PL/Perl or pl/pythonu. I'll follow up with an example.

Assuming your uuid is the hex-format bytea literal:

'\x0FCC6350118D11E4A5597DE5338EB025'

you could turn it into a uuid type with:

PL/Perl

create language plperlu;

create or replace function to_uuid(bytea) returns uuid language plperlu immutable as $$
use Data::UUID;
my $ug = new Data::UUID;
my $uuid = $ug->from_hexstring(substr($_[0],2));
return $ug->to_string($uuid);
$$
SET bytea_output = hex;

SELECT to_uuid(BYTEA '\x0FCC6350118D11E4A5597DE5338EB025');

PL/Python

It's probably faster and cleaner in Python because the PL/Python interface passes bytea as raw bytes not as hex strings:

CREATE LANGUAGE plpythonu;

CREATE or replace function to_uuid(uuidbytes bytea) 
RETURNS uuid LANGUAGE plpythonu IMMUTABLE 
AS $$
import uuid
return uuid.UUID(bytes=uuidbytes)
$$;

SELECT to_uuid(BYTEA '\x0FCC6350118D11E4A5597DE5338EB025');

In C, just for kicks. Ugly hack.

You can see the C extension module here.

But really, I mean it about it being ugly. If you want it done properly in C, it's best to actually patch PostgreSQL rather than use an extension.

Craig Ringer
  • 307,061
  • 76
  • 688
  • 778
  • Perhaps I should have mentioned more clearly that the contents of the bytea column is the java-serialized version of an UUID-instance, e.g.: `select guid from documents limit 1;` `"\254\355\000\005sr\000\016java.util.UUID\274\231\003\367\230m\205/\002\000\002J\000\014leastSigBitsJ\000\013mostSigBitsxp\273\222)\360*r\322\262u\274\310\020\342\004M "` – FvHovell Jul 23 '14 at 11:59
  • @FvHovell Er, yes. That would've been useful to say the least. This is why you should **always include sample data**. – Craig Ringer Jul 23 '14 at 12:14
2

After some trial and error I have created the following function to extract the postgresql-UUID value:

CREATE OR REPLACE FUNCTION bytea2uuid (x bytea) RETURNS uuid as $$ SELECT encode(substring(x, 73, 8) || substring(x, 65, 8), 'hex')::uuid $$ language sql;

This works by extracting the bytes used in the java long-values for leastSigBits and mostSigBits (which are stored in reversed order), than encoding to hex and casting to type 'uuid'.

Used as follows: select bytea2uuid(guid) as guid from documents limit 1;

"75bcc810-e204-4d20-bb92-29f02a72d2b2"

FvHovell
  • 171
  • 1
  • 1
  • 6
  • You'd better verify the serialVersionUid field too, make sure you're not decoding something wrong. – Craig Ringer Jul 23 '14 at 12:10
  • You are correct in the generic case, but for my purposes I know for certain that all UUID-serialization was and is performed using java 6, as that is used for all our projects. Thus in my case I do not need to check the _serialVersionUid_, as that is guaranteed to be the same for all _guid_-values. – FvHovell Jul 23 '14 at 12:28
0

This works for me:

ALTER TABLE myTable ALTER COLUMN id TYPE uuid USING CAST(ENCODE(id, 'hex') AS uuid);
jwebuser
  • 103
  • 1
  • 3
  • That is not the correct encoding asked about: an java.util.UUID serialized value such as: "\254\355\000\005sr\000\016java.util.UUID\274\231\003\367\230m\205/\002\000\002‌​J\000\014leastSigBitsJ\000\013mostSigBitsxp\273\222)\360*r\322\262u\274\310\020\3‌​42\004M " – FvHovell Jan 14 '15 at 12:42