1

I've written a short function to strip the leading whitespace from a multiline string literal and concatenate the lines, as if you'd written out several concatenated strings or a single very long one.

def stripMultiline(input : String) = 
  input.split("\n").map(_.dropWhile(_.isWhitespace).stripLineEnd).mkString

It works the way I'd expect in the REPL:

scala> val longString =
     | """
     |   one fish,
     |   two fish,
     |   red fish,
     |   blue fish
     | """

scala> stripMultiline(longString)
res0: String = one fish, two fish, red fish, blue fish

However, if I put the same code into a main method and compile it with scalac, I see something different:

package substitutions

object Main {
  def stripMultiline(input : String) = 
    input.split("\n").map(_.dropWhile(_.isWhitespace).stripLineEnd).mkString

  def main(args : Array[String]): Unit = {
    val text = 
    """
      one fish, 
      two fish, 
      red fish, 
      blue fish
    """

    val oneLine = stripMultiline(text)
    println(oneLine)
  }
}

(back in the console)

C:\KC\code\scala\sub>scala substitutions.Main
blue fish

I'm running Scala 2.10 for both the REPL and Scalac. I've seen the error on Windows 7 32 bit and 64 bit. Can anybody think of why this behavior isn't the same in both versions? It threw me for a loop. Is this a problem in my logic, or should I be filing a bug report?

KChaloux
  • 3,918
  • 6
  • 37
  • 52
  • I've tried to do the same with **scala 2.10.2, OSX 10.8.4** and compiled version works as expected: *one fish, two fish, red fish, blue fish* – om-nom-nom Jun 25 '13 at 14:27
  • @om-nom-nom What version are you using? – KChaloux Jun 25 '13 at 14:28
  • Both versions are fine for me on scala 2.10.0, OSX 10.8.4. Are you using windows? This could be a line ending issue. – Noah Jun 25 '13 at 14:29
  • @Noah Yes, I am on windows. I've had to submit a bug report that was Windows specific before. I have a sneaking suspicion that the developers don't pay as much mind that that OS :p – KChaloux Jun 25 '13 at 14:31
  • 3
    I'm starting to bet I need to strip out `\r`s explicitly. – KChaloux Jun 25 '13 at 14:33

1 Answers1

4

Your problem is the source file encoding:

scala> stripMultiline("\r\n      one fish, \r\n      two fish, \r\n      red fish, \r\n      blue fish\r\n    ")
"lue fish ng = "one fish,

scala> stripMultiline("\n      one fish, \n      two fish, \n      red fish, \n      blue fish\n    ")
res1: String = one fish, two fish, red fish, blue fish

After split("\n") you get \r (Carriage return) in every line.

As hotfix you could split your lines on (\r)?\n:

def stripMultiline(input : String) =
  input.split("(\r)?\n").map(_.dropWhile(_.isWhitespace).stripLineEnd).mkString
stripMultiline: (input: String)String

scala> stripMultiline("\r\n      one fish, \r\n      two fish, \r\n      red fish, \r\n      blue fish\r\n    ")
res0: String = one fish, two fish, red fish, blue fish
Community
  • 1
  • 1
senia
  • 37,745
  • 4
  • 88
  • 129
  • Yep, that's the issue. I completely forgot about Windows' weird line endings. – KChaloux Jun 25 '13 at 14:37
  • @om-nom-nom Thanks for the platform-independent suggestion. I'll definitely use that. – KChaloux Jun 25 '13 at 14:38
  • @om-nom-nom, sorry, but it's not a solution. It's a bigger problem. Source file encoding has nothing with **target** platform line separator. It works only if you test your code on the same platform you white it - so you'll get a bug in production, not in test environment. – senia Jun 25 '13 at 14:40
  • @KChaloux: It's not a platform-independent suggestion. Do not ever use it! It useful only if you are sure that current platform settings affects your input. It's not true for most cases. – senia Jun 25 '13 at 14:42
  • @senia yep, I forget that string comes from the source file itself, not from elsewhere on the system. – om-nom-nom Jun 25 '13 at 14:42
  • @om-nom-nom, Web-services, resource files, data files - in all these cases platform configuration is useless. You could use it only if you work with user input, but in this case you should work with encoding, not with some parts of it, like the line separator. – senia Jun 25 '13 at 14:45
  • @senia Okay. `split("(\r)?\n")` it is then. – KChaloux Jun 25 '13 at 14:48
  • @KChaloux: in my opinion the best solution is to configure `utf-8` with `\n` separators in your IDE and use `split("\n")`. And if you are working with string not from the source file you should use correct encoding. `split("(\r)?\n")` is the dirty hotfix. – senia Jun 25 '13 at 14:51
  • @senia Not using an IDE at the moment, but I'll keep it in mind. – KChaloux Jun 25 '13 at 14:53