0

Byte order mark is making my regex fail when using scala.io.Source to read from a file. This answer is a lightweight solution using java.io. Is there anything similar for scala.io.Source, or will I have to revert back to Java because of a single byte?

parazs
  • 43
  • 3
  • 2
    Could you just use `Source.fromInputStream` to wrap one of the answers from the linked question? – Joe K Dec 15 '17 at 19:40

1 Answers1

1

Based on Joe K's idea in his comment, and using Andrei Punko's answer for the problem in Java and Alvin Alexander's Scala code, the simplest solution to read a file possibly containing byte order mark into an array of string is:

@throws[IOException]
def skip(reader: Reader): Unit = {
    reader.mark(1)
    val possibleBOM = new Array[Char](1)
    reader.read(possibleBOM)
    if (possibleBOM(0) != '\ufeff') reader.reset
}

val br = new BufferedReader(new InputStreamReader(new FileInputStream(file)))
skip(br)

val lines = {
    val ls = new ArrayBuffer[String]()
    var l: String = null
    while ({l= br.readLine; l != null}) {
      ls.append(l)
    }
    br.close
    ls.toArray
}
parazs
  • 43
  • 3