I have a Haskell application that, as one of many steps, needs to store and retrieve raw binary blob data in a database. I'm not completely above deciding to, instead, store that data in plain disk files, but that does start leading to an additional round of permissions issues, so right now I want to go with the database.
I've created a table with a column of type bytea
.
I have a Lazy Bytestring in memory.
When I make a call like this
run conn "INSERT INTO documents VALUES (?)" [toSql $ rawData mydoc]
postgres gets a bit angry at the data. The exact error message is
invalid byte sequence for encoding \"UTF8\": 0xcf72
I also know beyond doubt that I have NUL values in the data stream. So, with all of that in mind, what is the correct way to encode the data safely for insertion?
Updated
Here is the description for my table
db=> \d+ documents
Table "public.documents"
Column | Type | Modifiers | Storage | Description
-----------------+-----------------------------+-----------+----------+-------------
id | character varying(16) | not null | extended |
importtime | timestamp without time zone | not null | plain |
filename | character varying(255) | not null | extended |
data | bytea | not null | extended |
recordcount | integer | not null | plain |
parsesuccessful | boolean | not null | plain |
Indexes:
"documents_pkey" PRIMARY KEY, btree (id)
This is the full text of a module that demonstrates the current problem I'm having after adding jamsdidh's code. My error message has changed from the encoding problem above to "invalid input syntax for type bytea".
module DBMTest where
import qualified Data.Time.Clock as Clock
import Database.HDBC.PostgreSQL
import Database.HDBC
import Data.ByteString.Internal
import Data.ByteString hiding (map)
import Data.Char
import Data.Word8
import Numeric
exampleData = pack ([0..65536] :: [Word8]) :: ByteString
safeEncode :: ByteString -> ByteString
safeEncode x = pack (convert' =<< unpack x)
where
convert' :: Word8 -> [Word8]
convert' 92 = [92, 92]
convert' x | x >= 32 && x < 128 = [x]
convert' x = 92:map c2w (showIntAtBase 8 intToDigit x "")
runTest = do
conn <- connectPostgreSQL "dbname=db"
t <- Clock.getCurrentTime
withTransaction conn
(\conn -> run conn
"INSERT INTO documents (id, importTime, filename, data, recordCount, parseSuccessful) VALUES (?, ?, ?, ?, ?, ?)"
[toSql (15 :: Int),
toSql t,
toSql ("Demonstration data" :: String),
toSql $ safeEncode exampleData,
toSql (15 :: Int),
toSql (True :: Bool)])