2

I have created a function where I take in as a parameter an inputstream and return an iterator consisting of a string. I accomplish this as follows:

def lineEntry(fileInputStream:InputStream):Iterator[String] = {
   Source.fromInputStream(fileInputStream).getLines()
}

I use the method as follows:

val fStream = getSomeInputStreamFromSource()
lineEntry(fStream).foreach{
  processTheLine(_)
}

Now it is quite possible that the method lineEntry might blow up if it encounters a bad character while it's iterating over the inputstream using the foreach.

What are some of the ways to counter this situation?

sc_ray
  • 7,803
  • 11
  • 63
  • 100

1 Answers1

4

Quick solution (for Scala 2.10):

def lineEntry(fileInputStream:InputStream):Iterator[String] = {
  implicit val codec = Codec.UTF8 // or any other you like
  codec.onMalformedInput(CodingErrorAction.IGNORE)

  Source.fromInputStream(fileInputStream).getLines()
}

In Scala 2.9 there's a small difference:

implicit val codec = Codec(Codec.UTF8)

Codec has also a few more configuration options with which you can tune its behaviour in such cases.

ghik
  • 10,706
  • 1
  • 37
  • 50
  • Thanks. It seems like onMalformedInput is not there for scala.Source.IO.Codec.UTF8 in my version of the compiler. Is this a 2.10 feature? I am using 2.9.2 – sc_ray Apr 06 '13 at 00:36
  • I am trying this:implicit val codec = Codec.UTF8.newDecoder() codec.onMalformedInput(CodingErrorAction.IGNORE) – sc_ray Apr 06 '13 at 00:44
  • Thanks. I haven't tested it out yet but I will mark this as an answer. Do you know by anychance where I can find the source for the Codec class? – sc_ray Apr 06 '13 at 01:12
  • @sc_ray Here's 2.9.2 version: https://github.com/scala/scala/blob/v2.9.2/src/library/scala/io/Codec.scala – ghik Apr 06 '13 at 01:14