The PostgreSQL types bytea
and bit varying
sound similar:
bytea
stores binary strings.bit varying
stores strings of 1's and 0's.
The documentation does not mention a maximum size for either. Is it 1GB like character varying
?
I have two separate use cases, both over a table with millions of rows:
Storing MD5 hashes
That would be a bytea
with a length of 16 bytes or a bit(128)
. It would be used for:
- Deduplication: Heavy use of
GROUP BY
, with an index I suppose. - Querying with
WHERE md5 =
for exact matches only. - Displaying as a hex string for human use.
Storing arbitrary binary data
Strings of binary data of varying length up to 4kB for:
- Bitwise operations to find the strings matching a certain mask. Example at the end of this post.
- Extracting some bytes, for instance get the integer value of the byte 14 in my string.
- Some deduplication.
Working example for the bitwise operation, using bit varying
. The mask is X'00FF00' and the it returns only the row X'AAAAAA'. I shortened the strings for the example but it would be over their full length, up to 4kB. Is it possible to do something similar with bytea
?
CREATE TABLE test1 (mystring bit varying);
INSERT INTO test1 VALUES (X'AAAAAA'), (X'ABCABC');
SELECT * FROM test1 WHERE mystring & X'00FF00' = X'00AA00';
Which of bytea
and bit varying
is the more appropriate?
I saw the UUID
type is made to store exactly 16 bytes, would that be any advantage to store the MD5's?