13

I'm wanting to write a big chunk of C# code using immutable F#. It's a device monitor, and the current implementation works by constantly getting data from a serial port and updating member variables based on new data. I'd like to transfer that to F# and get the benefits of immutable records, but my first shot at a proof-of-concept implementation is really slow.

open System
open System.Diagnostics

type DeviceStatus = { RPM         : int;
                      Pressure    : int;
                      Temperature : int }

// I'm assuming my actual implementation, using serial data, would be something like 
// "let rec UpdateStatusWithSerialReadings (status:DeviceStatus) (serialInput:string[])".
// where serialInput is whatever the device streamed out since the previous check: something like
// ["RPM=90","Pres=50","Temp=85","RPM=40","Pres=23", etc.]
// The device streams out different parameters at different intervals, so I can't just wait for them all to arrive and aggregate them all at once.
// I'm just doing a POC here, so want to eliminate noise from parsing etc.
// So this just updates the status's RPM i times and returns the result.
let rec UpdateStatusITimes (status:DeviceStatus) (i:int) = 
    match i with
    | 0 -> status
    | _ -> UpdateStatusITimes {status with RPM = 90} (i - 1)

let initStatus = { RPM = 80 ; Pressure = 100 ; Temperature = 70 }
let stopwatch = new Stopwatch()

stopwatch.Start()
let endStatus = UpdateStatusITimes initStatus 100000000
stopwatch.Stop()

printfn "endStatus.RPM = %A" endStatus.RPM
printfn "stopwatch.ElapsedMilliseconds = %A" stopwatch.ElapsedMilliseconds
Console.ReadLine() |> ignore

This runs in about 1400 ms on my machine, whereas the equivalent C# code (with mutable member variables) runs in around 310 ms. Is there any way to speed this up without losing the immutability? I was hoping that the F# compiler would notice that initStatus and all the intermediate status variables were never reused, and thus simply mutate those records behind the scene, but I guess not.

Dax Fohl
  • 10,654
  • 6
  • 46
  • 90
  • I'm not sure I understand what this is intended to do, but perhaps you can reduce allocation by only returning the part that changes. In this example, Pressure and Temperature are constant, so why copy them? – Daniel Apr 14 '11 at 19:05
  • It may be easier to help if you provide complete code, since your POC is a bit contrived. – Daniel Apr 14 '11 at 19:07
  • Andrey's point about `DateTime` makes me wonder if changing `DeviceStatus` to a struct would help performance. – Daniel Apr 14 '11 at 19:09
  • 1
    As a sanity check, you *were* measuring a release build, not a debug build, right? While the C# compiler does next to no optimization for release builds, for F# building with optimizations on can make orders of magnitude of difference. – ildjarn Apr 14 '11 at 19:57
  • @ildjarn: Yeah, release build, but thanks for the info anyway as I wasn't aware of that. – Dax Fohl Apr 14 '11 at 20:21

4 Answers4

13

In the F# community, imperative code and mutable data aren't frowned upon as long as they're not part of your public interface. I.e., using mutable data is fine as long as you encapsulate it and isolate it from the rest of your code. To that end, I suggest something like:

type DeviceStatus =
  { RPM         : int
    Pressure    : int
    Temperature : int }

// one of the rare scenarios in which I prefer explicit classes,
// to avoid writing out all the get/set properties for each field
[<Sealed>]
type private DeviceStatusFacade =
    val mutable RPM         : int
    val mutable Pressure    : int
    val mutable Temperature : int
    new(s) =
        { RPM = s.RPM; Pressure = s.Pressure; Temperature = s.Temperature }
    member x.ToDeviceStatus () =
        { RPM = x.RPM; Pressure = x.Pressure; Temperature = x.Temperature }

let UpdateStatusITimes status i =
    let facade = DeviceStatusFacade(status)
    let rec impl i =
        if i > 0 then
            facade.RPM <- 90
            impl (i - 1)
    impl i
    facade.ToDeviceStatus ()

let initStatus = { RPM = 80; Pressure = 100; Temperature = 70 }
let stopwatch = System.Diagnostics.Stopwatch.StartNew ()
let endStatus = UpdateStatusITimes initStatus 100000000
stopwatch.Stop ()

printfn "endStatus.RPM = %d" endStatus.RPM
printfn "stopwatch.ElapsedMilliseconds = %d" stopwatch.ElapsedMilliseconds
stdin.ReadLine () |> ignore

This way, the public interface is unaffected – UpdateStatusITimes still takes and returns an intrinsically immutable DeviceStatus – but internally UpdateStatusITimes uses a mutable class to eliminate allocation overhead.

EDIT: (In response to comment) This is the style of class I would normally prefer, using a primary constructor and lets + properties rather than vals:

[<Sealed>]
type private DeviceStatusFacade(status) =
    let mutable rpm      = status.RPM
    let mutable pressure = status.Pressure
    let mutable temp     = status.Temperature
    member x.RPM         with get () = rpm      and set n = rpm      <- n
    member x.Pressure    with get () = pressure and set n = pressure <- n
    member x.Temperature with get () = temp     and set n = temp     <- n
    member x.ToDeviceStatus () =
        { RPM = rpm; Pressure = pressure; Temperature = temp }

But for simple facade classes where each property will be a blind getter/setter, I find this a bit tedious.

F# 3+ allows for the following instead, but I still don't find it to be an improvement, personally (unless one dogmatically avoids fields):

[<Sealed>]
type private DeviceStatusFacade(status) =
    member val RPM         = status.RPM with get, set
    member val Pressure    = status.Pressure with get, set
    member val Temperature = status.Temperature with get, set
    member x.ToDeviceStatus () =
        { RPM = x.RPM; Pressure = x.Pressure; Temperature = x.Temperature }
ildjarn
  • 62,044
  • 9
  • 127
  • 211
  • Awesome, this is exactly what I was looking for; someone to state definitively whether it was possible or not to get mutable performance from an immutable structure, and, given that it's not, demonstrate the F# community best practices for this problem. Thanks! – Dax Fohl Apr 14 '11 at 22:56
  • @ildjarn : What do you mean "explicit classes"? What do you "normally" prefer? What's different about this situation to make you want to use explicit classes? – Dax Fohl Apr 15 '11 at 01:34
  • 1
    @Dax : Edited to demonstrate. – ildjarn Apr 15 '11 at 01:47
  • 2
    @Dax : Note that you could use another record with mutable fields rather than a class and achieve the exact same effect, but the drawback is that if you use the same field names as your non-facade underlying record, then they become ambiguous every time you make a record or use record pattern matching, making using either record more awkward. – ildjarn Apr 15 '11 at 01:53
  • @ildjarn : I was just trying that. The other downside of that is that (after looking through red gate reflector) I found that mutable record fields are exposed as .Net Properties, while mutable class fields are exposed as .Net Fields, which are much more efficient. In my experiments the mutable records are running about twice as slow as the mutable classes. Also note your solution is about 3* as fast in Release mode vs Debug mode, as per your comment on the original post. – Dax Fohl Apr 15 '11 at 03:00
  • 1
    @Dax : I tried with a mutable record, I got the exact same performance as with the classes posted in my answer. Re: fields vs. properties, as I recall, the CLR's JIT compiler's heuristics for inlining are based purely on method size (something like the IL must be <= 16 bytes), so simple blind getter/setter properties are always inlined and consequently the same speed as direct field accesses. I'm curious to see your mutable record definition that performed slower, because that shouldn't be the case. – ildjarn Apr 15 '11 at 04:13
  • @ildjarn: I was using F5 instead of Ctrl-F5. Seems that even in Release mode, those JIT optimizations don't occur if a debugger is attached. – Dax Fohl Apr 15 '11 at 15:06
  • 3
    @Dax : Ah, yep, that would certainly do it. :-] To be clear, the JIT compiler does *zero* optimizations if a process is launched with a debugger attached; if one launches a process then attaches a debugger afterwards, optimizations are enabled. – ildjarn Apr 15 '11 at 15:22
  • 4
    F# 3.0 will introduce a new syntax that allows properties to have a backing store without a separate `let` statement. `member val Property2 = "" with get, set` http://msdn.microsoft.com/en-us/library/dd483467(v=vs.110).aspx – gradbot Dec 31 '11 at 17:15
8

This won't answer your question, but it's probably worth stepping back and considering the big picture:

  1. What do you perceive as the advantage of immutable data structures for this use case? F# supports mutable data structures, too.
  2. You claim that the F# is "really slow" - but it's only 4.5 times slower than the C# code, and is making more than 70 million updates per second... Is this likely to be unacceptable performance for your actual application? Do you have a specific performance target in mind? Is there reason to believe that this type of code will be the bottleneck in your application?

Design is always about tradeoffs. You may find that for recording many changes in a short period of time, immutable data structures have an unacceptable performance penalty given your needs. On the other hand, if you have requirements such as keeping track of multiple older versions of a data structure at once, then the benefits of immutable data structures may make them attractive despite the performance penalty.

kvb
  • 54,864
  • 2
  • 91
  • 133
  • 2
    'only 4.5 times slower' - You call that 'only'? :) – Robert Jeppesen Apr 14 '11 at 19:45
  • I think mostly for making UI updates easier to multithread. Currently everything uses the age-old observer pattern, which works well in a single-threaded env, but if I try to multithread this then the underlying DeviceStatus could change while rendering. I feel like a better approach is to use an immutable DeviceStatus to update the whole screen at once, which would allow the serial parser and the UI updater to run in separate threads. I'm now leaning toward having a mutable record at the bottom for performance, but then copying to an immutable one for each frame. – Dax Fohl Apr 14 '11 at 19:51
  • 1
    4.5 times 100 ms isn't bad. 4.5 times 30 secs is horrible. – Daniel Apr 14 '11 at 19:51
  • I'm not *positive* this is a bottleneck, but the end application would have to run on a much smaller processor, and data comes in pretty fast. Plus the record will be a lot bigger than three fields, so a lot more info to copy. – Dax Fohl Apr 14 '11 at 19:57
  • @Dax: You realize the multithreading design you're talking about here is completely incompatible with the compiler optimization you were hoping for in your original question, right? – Dax Fohl Apr 14 '11 at 20:23
  • 2
    @Robert - it's all relative... 5% slower might be too much in some circumstances, but in other cases a factor of 10 might be acceptable. I don't think that a 4.5x slowdown is terrible considering the number of objects that have to be created in the immutable scenario compared to the trivial amount of work being done in the loop in the mutable case. Performing any amount of calculation in the loop should bring the numbers closer together. – kvb Apr 14 '11 at 20:42
  • @Dax : Are you talking to yourself? o.O – ildjarn Apr 14 '11 at 21:46
  • @ildjarn : Yes, I find myself easy to talk to. – Dax Fohl Apr 14 '11 at 22:58
7

I suspect the performance problem you are seeing is due to the block memory zeroing involved when cloning the record (plus a negligible time for allocating it and subsequently garbage collecting it) in every iteration of the loop. You could rewrite your example using a struct:

[<Struct>]
type DeviceStatus =
    val RPM : int
    val Pressure : int
    val Temperature : int
    new(rpm:int, pres:int, temp:int) = { RPM = rpm; Pressure = pres; Temperature = temp }

let rec UpdateStatusITimes (status:DeviceStatus) (i:int) = 
    match i with
    | 0 -> status
    | _ -> UpdateStatusITimes (DeviceStatus(90, status.Pressure, status.Temperature)) (i - 1)

let initStatus = DeviceStatus(80, 100, 70)

The performance will now be close to that of using global mutable variables or by redefining UpdateStatusITimes status i as UpdateStatusITimes rpm pres temp i. This will only work if your struct is no more than 16 bytes long as otherwise it will get copied in the same sluggish manner as the record.

If, as you've hinted at in your comments, you intend to use this as part of a shared-memory multi-threaded design then you will need mutability at some point. Your choices are a) a shared mutable variable for each parameter b) one shared mutable variable containing a struct or c) a shared facade object containing mutable fields (like in ildjarn's answer). I would go for the last one since it is nicely encapsulated and scales beyond four int fields.

petebu
  • 1,827
  • 12
  • 15
  • That's a bit faster, but still immutability and the inherent copying of data from the old struct/record to the new one is the bottleneck, as adding more fields to DeviceStatus causes a linear slowdown. BTW I'm not "reading" device status; the device is streaming status packets to me over serial, so I have to update my data with each new packet, which come in very fast, hence the definition of UpdateStatusWithSerialReadings. – Dax Fohl Apr 14 '11 at 20:08
  • @Dax. Linear in terms of the number of fields? Well, yes, but the same is true when updating global variables or function arguments. Yes, with globals you can check whether the data has changed and update it only in that case, but I would expect the read and comparison to be no faster than just a write. – petebu Apr 14 '11 at 20:26
  • Also, it's not copying that's the bottleneck - copying always occurs (except in the case mentioned above which is slower - I just tried). It's the allocation and garbage collection. – petebu Apr 14 '11 at 20:45
  • 1
    @petebu : I can reproduce the speed difference between FSI and release builds, I'm **very** curious about that as well. – ildjarn Apr 14 '11 at 22:43
  • @petebu, @ildjarn: I also was able to reproduce the speed difference. Interesting that it varies with the number of fields. For a struct with four fields or fewer I get around 179 ms, but as soon as I add the fifth field it shoots up to 1072 ms, and then increases by about 100ms with each additional field. This odd behavior is only on FSI; compiled versions increase linearly as one would think. – Dax Fohl Apr 15 '11 at 00:14
  • 2
    @ildjarn, @Dax: http://stackoverflow.com/questions/2437925/net-why-is-struct-better-with-being-less-than-16-bytes It appears that structs longer than 16 bytes aren't copied using MOV instructions but using a block memory copy. This would explain why having more than four int fields has such a drastic effect on performance in FSI. But it doesn't explain why the stand-alone compiler is different. – petebu Apr 15 '11 at 03:47
  • I still stand by my comment that copying is always involved - it's just that not all copying is the same. :-) It also means the heap overhead of the record version is much smaller than I thought - the biggest cost is probably the block memory copy involved. Now, is there a badge for downvoting my own answer? – petebu Apr 15 '11 at 03:50
  • 1
    @petebu, @ildjarn: Are you using F5 or Ctrl-F5? On my machine, hitting Ctrl-F5 (Start Without Debugging) gives FSI performance, so it seems that it's just something the debugger is doing that is preventing the block memory copy (even in Release mode). – Dax Fohl Apr 15 '11 at 15:03
  • @Dax: Aaah. That's it. Now I feel foolish. I've rewritten my answer in light of the comments. – petebu Apr 15 '11 at 17:41
  • @petebu : This code is still slower than the OP's for me (release + Ctrl+F5). Given that it's apparently faster for you and Dax, I wonder if there's something pathological going on in the CLR JITter -- maybe .NET 4.0 SP1 specific, maybe x64 specific -- who knows? In any case, since I seem to be the odd one out, downvote removed :-] – ildjarn Apr 16 '11 at 00:18
  • @DaxFohl: "That's a bit faster". 6x faster here. – J D Dec 31 '11 at 07:54
5

Using a tuple as follows is 15× faster than your original solution:

type DeviceStatus = int * int * int

let rec UpdateStatusITimes (rpm, pressure, temp) (i:int) = 
    match i with
    | 0 -> rpm, pressure, temp
    | _ -> UpdateStatusITimes (90,pressure,temp) (i - 1)

while true do
  let initStatus = 80, 100, 70
  let stopwatch = new Stopwatch()

  stopwatch.Start()
  let rpm,_,_ as endStatus = UpdateStatusITimes initStatus 100000000
  stopwatch.Stop()

  printfn "endStatus.RPM = %A" rpm
  printfn "Took %fs" stopwatch.Elapsed.TotalSeconds

BTW, you should use stopwatch.Elapsed.TotalSeconds when timing.

J D
  • 48,105
  • 13
  • 171
  • 274