5

The documentation for Data.Vector.unsafeFreeze says:

Unsafe[ly] convert a mutable vector to an immutable one without copying. The mutable vector may not be used after this operation.

I would like to characterize exactly what "unsafe" means here. Experimentally, it appears to "only" imply that further modification of the originating mutable vector will cause the immutable vector(s) returned by unsafeFreeze to no longer be pure:

$ import qualified Data.Vector as V
$ import qualified Data.Vector.Mutable as MV
$ import Control.Monad.ST
$ :{
$ |runST $ do
$ |        mv <- V.thaw $ V.fromList [0..10]
$ |        v <- V.unsafeFreeze mv
$ |        MV.write mv 0 (-1)
$ |        MV.write mv 1 (-2)
$ |        v' <- V.freeze mv
$ |        v'' <- V.unsafeFreeze mv
$ |        return (v, v', v'')
$ |:}
([-1,-2,2,3,4,5,6,7,8,9,10],[-1,-2,2,3,4,5,6,7,8,9,10],[-1,-2,2,3,4,5,6,7,8,9,10])

I can imagine a modifying the source used in an "unsafe" freeze doing all sorts of gnarly things that would lead to much worse behaviour, e.g. segfaulting. I'm quickly out of my depth a.t.m. attempting to read the source around the unsafe operations, unfortunately.

Can I rely on said impurity being the sole way in which these operations are "unsafe"?

For context: I have need to implement a variety of modifying algorithms over a typically immutable data structure, and not reusing its public-facing API within that scope of internal mutability would be extremely inconvenient (since AFAICT there is no way to generically access both mutable and immutable vectors). (Ab)using unsafeFreeze when I need to use that API would be the perfect escape hatch, as long as I'm not setting myself up for much more unpleasant side effects down the road.

cemerick
  • 5,916
  • 5
  • 30
  • 51
  • 2
    Usually 'unsafe' means 'violates purity/transparency'. I think this violates referential transparency, hence is unsafe. – AJF Jan 04 '19 at 15:32
  • 1
    FWIW, I find `unsafeThaw` *much* scarier than `unsafeFreeze`, because the requirements for using it safely aren't as well-confined to the `IO` context. The only place I've personally seen `unsafeThaw` used is in the `fromList` mechanism of `Data.HashMap`, where it's used to avoid an extra tree traversal. That code is *frighteningly* subtle. If you use only `unsafeFreeze` (not `unsafeThaw`) and you never write to a frozen array, you can be pretty confident that everything will work as you expect. – dfeuer Jan 04 '19 at 19:35
  • 1
    Note: I conjecture that in *most* cases where one might be tempted to use `unsafeThaw`, it would actually be better to just stick to mutable vectors throughout. `HashMap.fromList` is a bit special because it uses `unsafeThaw` in the process of creating arrays that are permanently frozen at the end, never to be re-thawed. If there were some way to tie a bunch of arrays to a "token" and freeze them all at once, that would be much cleaner, and I would argue for removing `unsafeThaw` altogether. But there is no such mechanism and there's no particularly clean way to make one that I can see. – dfeuer Jan 04 '19 at 19:43

1 Answers1

8

This usage pattern can crash: see this message for reference. The reason is that mutating an immutable array can create minor GC roots which are not actually visible to minor GC. This error only occurs if your array is in the old GC generation and the written objects are in the new generation, hence you won't trigger it with the simplest tests.

András Kovács
  • 29,931
  • 3
  • 53
  • 99
  • Thank you, this is sort of what I was afraid of. I'll perhaps post a follow-up to the ML, but I wonder if there are any mitigations, e.g. maybe constraining operations on the mutable vector/array to a single `-XStrict` module? – cemerick Jan 04 '19 at 16:49
  • You can a) leave the array mutable b) thaw and refreeze the array on writes. I don't know enough about your intended usage and API to tell which's right, but these are generally your options for remaining memory safe. – András Kovács Jan 04 '19 at 16:58
  • Please include the relevant bits from the linked message in the answer. – leftaroundabout Jan 04 '19 at 17:20
  • @leftaroundabout Can you elaborate? I included here the parts which I have found relevant. – András Kovács Jan 04 '19 at 17:28
  • I can confirm that this usage pattern will yield segfaults with nontrivial code. A simplified prototype of what I needed did produce outstanding results, but when fleshed out into its final form, the segfaults flowed. Turns out, "unsafe" really truly does mean _unsafe_. – cemerick Jan 08 '19 at 13:43