1

I am learning Scala in my free time - and as a learning exercise, I translated some OCaml code that I wrote about in another StackOverflow question to Scala. Since I am new to Scala, I'd appreciate some advice...

But before asking my questions - here's the original OCaml code:

let visited = Hashtbl.create 200000

let rec walk xx yy =
    let addDigits number =
        let rec sumInner n soFar =
            match n with
            | x when x<10  -> soFar+x
            | x -> sumInner (n/10) (soFar + n mod 10) in
        sumInner number 0 in
    let rec innerWalk (totalSoFar,listOfPointsToVisit) =
        match listOfPointsToVisit with
        | [] -> totalSoFar
        | _ ->
            innerWalk (
                listOfPointsToVisit
                (* remove points that we've already seen *)
                |> List.filter (fun (x,y) ->
                    match Hashtbl.mem visited (x,y) with
                    | true -> false (* remove *)
                    | _    -> (Hashtbl.add visited (x,y) 1 ; true))
                (* increase totalSoFar and add neighbours to list *)
                |> List.fold_left
                    (fun (sum,newlist) (x,y) ->
                        match (addDigits x)+(addDigits y) with
                        | n when n<26 ->
                            (sum+1,(x+1,y)::(x-1,y)::(x,y+1)::(x,y-1)::newlist)
                        | n -> (sum,newlist))
                    (totalSoFar,[])) in
    innerWalk (0,[(xx,yy)])

let _ =
    Printf.printf "Points: %d\n" (walk 1000 1000)

...and here's the Scala code I translated it to:

import scala.collection.mutable.HashMap

val visited = new HashMap[(Int,Int), Int]

def addDigits(number:Int) = {
    def sumInner(n:Int, soFar:Int):Int =
      if (n<10)
        soFar+n
      else
        sumInner(n/10, soFar+n%10)
    sumInner(number, 0)
}

def walk(xx:Int, yy:Int) = {
    def innerWalk(totalSoFar:Int, listOfPointsToVisit:List[(Int,Int)]):Int = {
        if (listOfPointsToVisit.isEmpty) totalSoFar
        else {
            val newStep = 
                listOfPointsToVisit.
                // remove points that we've already seen
                filter(tupleCoords => {
                    if (visited.contains(tupleCoords))
                        false
                    else {
                        visited(tupleCoords)=1 
                        true
                    }
                }).
                // increase totalSoFar and add neighbours to list
                foldLeft( (totalSoFar,List[(Int,Int)]()) )( (state,coords) => {
                    val (sum,newlist) = state
                    val (x,y) = coords
                    if (addDigits(x)+addDigits(y) < 26)
                        (sum+1,(x+1,y)::(x-1,y)::(x,y+1)::(x,y-1)::newlist)
                    else
                        (sum,newlist)
                });
            innerWalk(newStep._1, newStep._2)
        }
    }
    innerWalk(0, List((xx,yy)))
}

println("Points: " + walk(1000,1000))

The Scala code compiles and works correctly, reporting the proper result.

But...

  • Unless I missed something, I found no pipe operator in Scala (i.e. the |> of OCaml and F#) so I used the corresponding list methods (filter and fold Left). In this case the end result is pretty close to the original, but I am wondering - isn't the pipe operator a generally favorable - and more generic - approach for functional-style solutions? Why isn't Scala equipped with it?

  • In Scala, I had to specifically initiate my folding state (which is a tuple of (Int, List[Int,Int]) with a type-specific empty list. In plain words, List() didn't cut it - I had to explicitly specify List[(Int,Int)](), otherwise I got a... rather difficult error message. I deciphered it based on context - it complained about Nothing - and I realized the only place in this tiny code where a type Nothing appeared could be my empty List. Still, the result is uglier, compared to OCaml... Is there anything better I can do?

  • In the same vein, OCaml was able to pass the fold's resulting tuple as an argument to innerWalk. In Scala, I had to assign to a variable and invoke the tail-recursive call with innerWalk(newStep._1, newStep._2). There appears to be no equivalence between tuples and function arguments - i.e. I can't pass a tuple of 2-arity in a function with two arguments - and similarly, I can't tuple-destructure the arguments of a function to variables (I had to explicitely assign state and coords and de-structure them inside the folding function's body. Am I missing something?

Overall, I am pleased with the result - I'd say that if we grade the OCaml code of this example at 100%, then Scala is at about 85-90% - it's a bit more verbose than OCaml, but it's much, much closer to OCaml than it is to Java. I am just wondering whether I used Scala to its full potential or whether I missed some constructs that would improve the code (more likely).

Note that I avoided mapping my original OCaml's pattern matching to Scala's, since in this case I think it was overkill - an if expression is much clearer in both places.

Thanks in advance for any help / suggestions.

P.S. A side note - I added timing instructions around the walk call (thus avoiding the startup cost of the JVM) and measured my Scala code - it runs at about 50% of OCaml's speed - which is, funnily enough, exactly the same speed I get out of Mono executing the F# equivalent code (see my original SO question to get the F# code, if you care about this sort of comparison). Since I currently work in enterprise environments, 50% speed is a price I'll gladly pay to write concise ML-like code and still get access to the vastness of the JVM/.NET ecosystems (databases, Excel-file generation, etc). Sorry OCaml, I did try you - but you can't fully "speak" Oracle :-)

EDIT 1: After the kind suggestions from @senia and @lmm, the code is significantly improved. Hoping for more advice from @lmm about how foldMap and Shapeless will additionally help :-)

EDIT 2: I cleared up the code further with flatMap from scalaz - gist is here. Unfortunately, the change also caused a massive 10x slowdown - guessing that the list concatenation done by foldMap is much slower than foldLeft's "add just one new node". Wondering how I can change the code to make the addition fast...

EDIT 3: After another suggestion from @lmm, I switched the scalaz-flatMap version from using List to using immutable.Vector: This helped a lot, bringing the speed from 10x slower back to... only 2x slower (than the original code). So, clean code or 2x speed? Decisions, decisions... :-)

Community
  • 1
  • 1
ttsiodras
  • 10,602
  • 6
  • 55
  • 71
  • If you'll split your question into 3 different questions with minimal amount of code you'll get your answers quicker. – senia Dec 04 '14 at 10:52
  • 1
    Related: http://stackoverflow.com/a/15822877/406435 – senia Dec 04 '14 at 11:01
  • @senia: I don't see an easy way to do that - I'd have to copy the same code/intro in all 3 of them, probably violating some SO rule in the process. Plus I think the question has merit as it is, in standalone form - the 3 errors faced by someone already experienced in a strong type system when he tries Scala for the 1st time. – ttsiodras Dec 04 '14 at 11:02
  • @senia: Thanks for the reference! I just tried using it - i.e. the implicit class in http://stackoverflow.com/a/15822877/406435 - but my code now fails to compile with "Tuples cannot be directly destructured in method or function parameters". Looks like the answer to my 3rd question is somehow related to the answer to the 1st? – ttsiodras Dec 04 '14 at 11:06
  • It's difficult to write 3 different answers in one. 1. There is `|>` in `scalaz` and you could create your own implementation in 3 lines of code. And what's wrong with `fold`/`flatMap`? 2. It's because of how type inference works in `scala`. There are different ways to deal with it. 3. There is `tupled` method on `FunctionN`, but it's useless with tail recursion. Sad, but you have to deconstruct tuple manually. – senia Dec 04 '14 at 11:10
  • You should try to create 3 useless, but short code samples to illustrate your questions. – senia Dec 04 '14 at 11:14
  • @senia: You seem to answer them very succinctly in less than 300 characters :-) For (1), as I said, I tried your reference, but can't compile the result, getting errors about tuple destructuring - can you show me (Gist) how to do the change in the code? For (2), I'd love to hear about the ways to deal with it... For (3): OK, I guess that's a wall then. – ttsiodras Dec 04 '14 at 11:16
  • If this was the answer why you still have questions? It's not an answer, it's the direction to answer. And I don't want to write it as ` answer. 1. You should show what you've tried so far. As I know there is no `List.filter` in `scala`. Note that you can't use `|>` for `innerWalk` call - it'll ruin tail optimization. 2. I guess in this case `List[(Int, Int)]()` is the way to go. Note that you could use `{ case ((sum,newlist), coords) => ... }` instead of `( (state,coords) => {...} )`. – senia Dec 04 '14 at 11:34
  • It's better to use a `var` with an immutable data structure, unless the reason you're using a mutable Map in a `val` is performance optimisation. Also, in general, Scalaz is the de facto standard library for functional programming; vanilla Scala doesn't provide most of the things out of the box that e.g. Haskell does, such as various operators, type classes and instances thereof. – Erik Kaplun Dec 04 '14 at 12:10

2 Answers2

5
  • Scalaz does provide a |> operator, or you can write one yourself. In general there's a lot less need for it in Scala because objects have methods, as you can see in some of your translation (e.g. somethingThatReturnsList.filter(...) where in OCaml you'd have to write somethingThatReturnsList |> List.filter(...). So it's not built into the language. But if you need it, it's out there.
  • foldLeft is a bit general; you might be able to write clearer code using e.g. Scalaz foldMap (in the case of your tuple you might also need shapeless-contrib so that the appropriate typeclass instance is derived). But fundamentally yes, Scala type inference will be less reliable than OCaml and you will find yourself having to add explicit type annotations (sometimes because of unclear Nothing error messages) - it's the price we pay for allowing traditional-OO extends inheritance.
  • You can use (innerWalk _).tupled to get a function that takes a tuple. Or you could write your functions to accept tuples and take advantage of argument auto-tupling to call them without explicit tuple syntax. But yeah, there is no generic encoding of multi-argument functions (you can use Shapeless to convert them into that form), I suspect largely because of JVM compatibility. I suspect that if the standard library were written now it would use HLists for everything and there would be an equivalence between ordinary functions and a HList representation, but this would be a very hard change to make in a backwards-compatible way.

You seem to be using quite a lot of ifs, and there are functions for some of what you're doing, e.g. visited.put(tupleCoords, 1) returns a boolean for whether a value was replaced, so you could use that as the entire body of your filter call. And as I said, if you're willing to use Scalaz the foldLeft could be rewritten as a clearer foldMap. I suspect the whole recursive loop could be expressed with a named construct, but nothing immediately comes to mind, so maybe not.

lmm
  • 17,386
  • 3
  • 26
  • 37
  • Thanks for answering! About (1): Thanks, I'll have a look at Scalaz (didn't know about it). About (2): If it's possible, I'd appreciate a gist showing how the code would look with foldMap and shapeless-contrib (no idea about either of them as of now, so an example would help). About (3): What is Shapeless? Is it from the same shapeless-contrib you referred to (2)? – ttsiodras Dec 04 '14 at 12:00
  • Shapeless is a generic and type level programming library that relies heavily on Scala's path dependent types and type level computation. It allows you to take advantage of Scala's powerful type system by making it catch more errors at compile time. – Erik Kaplun Dec 04 '14 at 12:12
  • @lmm: I added a gist here: https://gist.github.com/anonymous/9ced028ef55f5dc21058 with the code as it becomes after your `put` suggestion and @senia's "auto" tuple destructure. Any additional hints about how foldMap and Shapeless would improve it? (I am a Scala noob, haven't read about them - yet) – ttsiodras Dec 04 '14 at 12:14
  • You could avoid the nested function in `addDigits` by using a default argument to the outer function (`def sumDigits(number: Int, soFar: Int = 0) = ...`). What I was talking about was replacing the `foldLeft` with e.g. `foldMap{case (x, y) => if(addDigits(x) + addDigits(y) < 26) (1,(x+1,y)::(x-1,y)::(x,y+1)::(x,y-1)::Nil) else (0, Nil)}`. That will do the summing for you, but you'll need scalaz and probably shapeless-contrib imports in scope for the typeclasses to exist. – lmm Dec 04 '14 at 12:33
  • Shapeless is a library for (among other things) handling tuple-like things genericly; shapeless-contrib (well, shapeless-scalaz) combines it with scalaz to e.g. derive typeclasses for tuples of things that have typeclasses. Scalaz already has a `Monoid[Int]` and a `Monoid[List[Int]]`, but for my `foldMap` to work you will need a `Monoid[(Int, List[Int])]`, so I think you'll need to depend on and import shapeless-scalaz. – lmm Dec 04 '14 at 12:34
  • @lmm: Turns out I didn't need `shapeless` - here's a gist using foldMap: https://gist.github.com/anonymous/bb0f5478692e90c02496 Unfortunately, it also made the code 10x slower, probably because foldMap list concatenation is not as quick as adding a single node in front (which is what my initial code does). – ttsiodras Dec 04 '14 at 13:22
  • Ah yeah, makes sense given the data structures involved. If you use `Vector` (which is a wide tree rather than a singly-linked list) it should perform much better, perhaps even as well as your original version. (In theory it could be parallelized to go faster, but probably only for very large lists as there would be overhead) – lmm Dec 04 '14 at 14:46
  • @lmm: I just tried using Vector - here's the relevant gist: https://gist.github.com/anonymous/30dda5c5b9a52ad1ae0f ... It indeed helped, but it's still 2x slower than the original code. The code is much nicer, though. Fast speed or Clean code? I have to pick one :-) – ttsiodras Dec 04 '14 at 15:34
  • Much belated thought - scalaz `DList` may be worth trying? – lmm Feb 03 '15 at 16:54
1

I attach two alternative versions with more idiomatic Scala code (I also optimized the algorithm slightly). I don't think there's anything wrong with an imperative solution in this case, it can actually be simpler to understand.

  // Impure functional version
  def walk2(xx: Int, yy: Int) = {
    val visited = new mutable.HashSet[(Int, Int)]

    def innerWalk(totalSoFar: Int, listOfPointsToVisit: Seq[(Int, Int)]): Int = {
      if (listOfPointsToVisit.isEmpty) totalSoFar
      else {
        val newStep = listOfPointsToVisit.foldLeft((totalSoFar, Seq[(Int, Int)]())) {
          case ((sum, newlist), tupleCoords@(x, y)) =>
            if (visited.add(tupleCoords) && addDigits(x) + addDigits(y) < 26)
              (sum + 1, (x + 1, y) +: (x - 1, y) +: (x, y + 1) +: (x, y - 1) +: newlist)
            else
              (sum, newlist)
        }

        innerWalk(newStep._1, newStep._2)
      }
    }

    innerWalk(0, Seq((xx, yy)))
  }

  // Imperative version, probably fastest
  def walk3(xx: Int, yy: Int) = {
    val visited = new mutable.HashSet[(Int, Int)]()
    val toVisit = new mutable.Queue[(Int, Int)]()

    def add(x: Int, y: Int) {
      val tupleCoords = (x, y)

      if (visited.add(tupleCoords) && addDigits(x) + addDigits(y) < 26)
        toVisit += tupleCoords
    }

    add(xx, yy)
    var count = 0

    while (!toVisit.isEmpty) {
      count += 1
      val (x, y) = toVisit.dequeue()
      add(x + 1, y)
      add(x - 1, y)
      add(x, y + 1)
      add(x, y - 1)
    }

    count
  }

Edit: improved functional version, used Queue in imperative version

Jesper Nordenberg
  • 2,104
  • 11
  • 15
  • Thanks! Indeed, your imperative solution is faster than [my version](https://gist.github.com/anonymous/9ced028ef55f5dc21058) by 21% (from 310ms down to 245ms). Increasingly though, I care more about coding in (mostly) functional style - I find functional code easier to test and reason about... Of course I do mutate when it makes sense (e.g. the `visited` HashMap above) - I [know](http://users.softlab.ntua.gr/~ttsiod/mandelSSE.html) how to use mutation [quite well](http://users.softlab.ntua.gr/~ttsiod/straylight.html) - it's just that code clarity is more important to me nowadays. – ttsiodras Dec 05 '14 at 12:16
  • I really like your functional-style version - it uses (a) flatMap, (b) the "trick" to make an inline function that accepts a tuple via pattern matching, and (c) doesn't depend on "templatized emptiness". It's a bit slower than mine (flatMap is probably costing a bit more than folding single nodes up front), but regardless, it's my favorite functional-style version of the code so far :-) – ttsiodras Dec 05 '14 at 12:34
  • Improved the both versions (folding seems to be quite a bit faster than flat mapping). Personally I sometimes find imperative algorithms easier to read and understand (first do this, then do that etc.), and typically it's easier to reason about their runtime performance and memory usage. It all depends on the algorithm. :) – Jesper Nordenberg Dec 05 '14 at 12:38