Unsigned Big-endian encoding of type Integer

Question

I am attempting to follow up on How to convert a Integer to a ByteString in Haskell. My objective is to create a function toBytes :: Integer -> Int -> ByteString which returns a big-endian encoding of the magnitude of a potentially large Integer. The resulting ByteString (lazy or strict I am not too sure yet) should be of length indicated by the Int argument. In essence, I would like to replicate the semantics of the python call (assuming value is a non-negative integer):

value.to_bytes(numBytes, byteorder='big', signed=False)

I do not need to have everything spelt out of course, I am only looking for guidance. For example, I would be very happy to have anything which brings me closer to my end goal with related semantics such java's toByteArray method on BigInteger (big-endian, two's complement, minimal size) or C#'s ToByteArray on BigInteger (little-endian, two's complement, minimal size). I can take care of endianness, and negative number, and add whatever padding is required...

The post I have indicated recommends packages such as Data.Binary and Data.Serialize but the encode function does not seem to follow a clear encoding standard:

Prelude Data.Binary> encode (0xfffefd :: Integer)
"\NUL\NUL\255\254\253" 

Prelude Data.Binary> encode (0xfffefdfc :: Integer)
"\SOH\SOH\NUL\NUL\NUL\NUL\NUL\NUL\NUL\EOT\252\253\254\255"

The type of encoding seems to change as the Integer gets larger, going from big-endian to little-endian for example. So I am wondering, whether it is a wise choice for me to try and build my own function on top of the encode function provided by these packages. I worry my code will rely on implementation details which may be specific to my host endianness and/or package version.

I am only doing this because I want to learn Haskell. So as a simple exercise, I am attempting to create a type with the same API as java's BigInteger (albeit with a better toByteArray):

{-# LANGUAGE GeneralizedNewtypeDeriving #-}

import qualified Data.ByteString.Lazy as L
import Data.Hashable

type ByteArray = L.ByteString
newtype Sign = Sign Int 
newtype NumBytes = NumBytes Int 


newtype Number = Number Integer
        deriving (Eq, Ord, Num, Real, Enum, Integral, Hashable)

fromBytes :: Sign   -> ByteArray -> Number
toBytes   :: Number -> NumBytes  -> ByteArray

Any guidance is warmly appreciated

not trying to discourage you, but I'm not sure ByteString et. al. is the easiest place to start learning haskell (http://learnyouahaskell.com/chapters is a good start though)... btw, the documentation of [Data.Binary states](https://hackage.haskell.org/package/binary-0.8.4.1/docs/Data-Binary.html): `Values encoded using the Binary class are always encoded in network order (big endian) ` — mb21, Oct 23 '16 at 15:27
Also, the return value should be as long as necessary to encode the value of the `Integer` argument, not necessarily fixed by another argument. — chepner, Oct 23 '16 at 15:30
" In essence, I would like to replicate the semantics of the python call " - what is the semantics of the python function? You could at least provide a link to the documentation, or better yet state the expected semantics yourself. What should your function do for `toBytes (10^10) 1`? It can't possible fit in one byte. Finally, just try to produce a `[Word8]` and then use `pack` and worry about performance later (if and when you discover it is insufficient) - this will make reasoning about the semantics of your algorithm easier. — user2407038, Oct 23 '16 at 15:33
@user2407038 my apologies, I wrongly assumed the signature was enough to infer what the function does. It returns a binary representation (big endian) of the number as an array of bytes of the desired `length` (which needs to be high enough or else the function will throw). Note that because `signed=False` my `length` argument need not accommodate a leading `0x00` byte to differentiate the encoding with that of a negative number in cases when the leading bit is 1. As `length` gets larger, more `0x00` bytes are padded to the left. — Sven Williamson, Oct 23 '16 at 16:02
@SvenWilliamson Take at a look at the source for [`Data.Binary`](http://hackage.haskell.org/package/binary-0.5.0.2/docs/src/Data-Binary.html#unroll) - it actually contains the function you want (`unroll :: Integer -> [Word8]`) minus the padding, which should be simple. You can also look at the source (and comments) for `instance Binary Integer` to see why it behaves as it does. — user2407038, Oct 23 '16 at 16:38
@user2407038 Thank your answer works for me. The key idea behind `unroll`, is to use `fromInteger` of instance `Num Word8` to get the first byte. Then use `shiftR` of instance `Bits Integer` and glue everything together with with `unfoldr`. And yes, I will use `pack` and `unpack` for now. — Sven Williamson, Oct 25 '16 at 15:07
@mb21 Thank you, I will work with `[Word8]` as suggested elsewhere :) — Sven Williamson, Oct 25 '16 at 15:10
@chepner, yes thank you, it is understood that the function should throw if the length argument is too small, while padding of `0x00` bytes to the left occur if the length argument is large — Sven Williamson, Oct 25 '16 at 15:13
@user2407038 needless to say if you were to write an answer I would gladly accept it :) — Sven Williamson, Oct 25 '16 at 15:21
Actually, I have had to define my own `unfold` as `unfoldr` of `Data.List` leads to the wrong byte ordering — Sven Williamson, Oct 25 '16 at 17:23

Unsigned Big-endian encoding of type Integer

0 Answers0