84

I can find tons of examples but they seem to either rely mostly on Java libraries or just read characters/lines/etc.

I just want to read in some file and get a byte array with scala libraries - can someone help me with that?

Matthew Farwell
  • 60,889
  • 18
  • 128
  • 171
fgysin
  • 11,329
  • 13
  • 61
  • 94
  • 3
    I think relying on Java libraries is what (almost?) everyone would do, the Scala library included. See for instance the source code of scala.io.Source. – Philippe Sep 29 '11 at 13:44
  • I know Scala relies on Java. But what is the point of a language where I can not even do simple file i/o without using *a different language*? – fgysin Sep 29 '11 at 13:51
  • 2
    You're not using a different language, just a standard JVM API that has proved good enough not to need replacing! – Duncan McGregor Sep 29 '11 at 14:12
  • 1
    Hm yeah, you are probably right... Still, it feels like cheating. :) – fgysin Sep 29 '11 at 14:21
  • 4
    Well, how do you think the Java classes are implemented? Deep down, somewhere, there is a native method: it has just a signature, no Java implementation, and relies on an OS-specific C implementation. Isn't that cheating too? :) – Philippe Sep 29 '11 at 14:47
  • 2
    It should be said that Scala on .Net does make this a more pressing issue. – Duncan McGregor Sep 29 '11 at 20:19
  • @Duncan McGregor: Good point, guess the transition isn't as smooth there... – fgysin Sep 30 '11 at 09:57
  • 4
    @Philippe: Sure, and using C is only cheating on assembly :P... What I meant is just, that the border between languages is usually rather clearly defined, Scala and Java sort of melt into each other. – fgysin Sep 30 '11 at 09:59
  • possible duplicate of [What is the proper way to code a read-while loop in Scala?](http://stackoverflow.com/questions/3011106/what-is-the-proper-way-to-code-a-read-while-loop-in-scala) – Suma Feb 06 '15 at 09:14

8 Answers8

143

Java 7:

import java.nio.file.{Files, Paths}

val byteArray = Files.readAllBytes(Paths.get("/path/to/file"))

I believe this is the simplest way possible. Just leveraging existing tools here. NIO.2 is wonderful.

Vladimir Matveev
  • 120,085
  • 34
  • 287
  • 296
47

This should work (Scala 2.8):

val bis = new BufferedInputStream(new FileInputStream(fileName))
val bArray = Stream.continually(bis.read).takeWhile(-1 !=).map(_.toByte).toArray
Jus12
  • 17,824
  • 28
  • 99
  • 157
  • I think this is a great example of wrapping a Java API function to get Stream semantics. Much appreciated. – qu1j0t3 Oct 28 '12 at 22:37
  • 3
    `val bis = new java.io.BufferedInputStream(new java.io.FileInputStream(fileName)); ` if you do not have the java paths imported – BeniBela Sep 21 '13 at 16:29
  • 1
    Using this approach, is closing the file also needed or is it implicit? – Max Nov 20 '13 at 00:18
  • 1
    You need to close it yourself – Tony K. Apr 01 '14 at 23:41
  • 15
    This approach is slow, since it needs to process each and every byte. Ideally, I/O operations should be block-based. – Dibbeke Aug 31 '14 at 17:10
  • I benchmarked it comparing to buffered approach, it's about 500 times slower on my test. (test config: compute CRC32 of a 14 MB file, which is repeatedly re-read from SSD in RAID-0 - so it's in system file cache; Intel Core i7 2nd gen; 16GB RAM). – morfizm Oct 21 '17 at 09:23
6

The library scala.io.Source is problematic, DON'T USE IT in reading binary files.

The error can be reproduced as instructed here: https://github.com/liufengyun/scala-bug

In the file data.bin, it contains the hexidecimal 0xea, which is 11101010 in binary and should be converted to 234 in decimal.

The main.scala file contain two ways to read the file:

import scala.io._
import java.io._

object Main {
  def main(args: Array[String]) {
    val ss = Source.fromFile("data.bin")
    println("Scala:" + ss.next.toInt)
    ss.close

    val bis = new BufferedInputStream(new FileInputStream("data.bin"))
    println("Java:" + bis.read)
    bis.close
  }
}

When I run scala main.scala, the program outputs follows:

Scala:205
Java:234

The Java library generates correct output, while the Scala library not.

fengyun liu
  • 101
  • 1
  • 1
  • 11
    If I set the encoding to `Source.fromFile("data.bin", "ISO8859-1")`, it works well. – fengyun liu Jan 21 '14 at 15:57
  • 6
    Maybe it's helpful, but really, this isn't an answer. Introducing a new problem in an answer is not constructive and belongs somewhere else. – Benjamin May 18 '17 at 05:14
5
val is = new FileInputStream(fileName)
val cnt = is.available
val bytes = Array.ofDim[Byte](cnt)
is.read(bytes)
is.close()
reivzy
  • 77
  • 1
  • 2
  • 3
    It is not a valid solution. From javadoc of InputStream.available: `Note that while some implementations of InputStream will return the total number of bytes in the stream, many will not. It is never correct to use the return value of this method to allocate a buffer intended to hold all data in this stream.` – m.bemowski Sep 27 '18 at 14:22
4

You might also consider using scalax.io:

scalax.io.Resource.fromFile(fileName).byteArray
OlivierBlanvillain
  • 7,701
  • 4
  • 32
  • 51
2

You can use the Apache Commons Compress IOUtils

import org.apache.commons.compress.utils.IOUtils

val file = new File("data.bin")
IOUtils.toByteArray(new FileInputStream(file))
JavaNoScript
  • 2,345
  • 21
  • 27
Sagi
  • 8,972
  • 3
  • 33
  • 41
  • 1
    I had to import import org.apache.commons.io.IOUtils instead of the suggested import. – niid Feb 27 '20 at 07:54
0

Asynchronous File reading using Scala Future and Java NIO2

  def readFile(path: Path)(implicit ec: ExecutionContext): Future[Array[Byte]] = {
    val p = Promise[Array[Byte]]()
    try {
      val channel = AsynchronousFileChannel.open(path, StandardOpenOption.READ)
      val buffer = ByteBuffer.allocate(channel.size().toInt);
      channel.read(buffer, 0L, buffer, onComplete(channel, p))
    }
    catch {
      case t: Exception => p.failure(t)
    }
    p.future
  }

  private def onComplete(channel: AsynchronousFileChannel, p: Promise[Array[Byte]]) = {
    new CompletionHandler[Integer, ByteBuffer]() {
      def completed(res: Integer, buffer: ByteBuffer): Unit = {
        p.complete(Try {
          buffer.array()
        })
      }

      def failed(t: Throwable, buffer: ByteBuffer): Unit = {
        p.failure(t)
      }
    }
  }
-2

I have used below code to read a CSV file.

import scala.io.StdIn.readLine
import scala.io.Source.fromFile

readFile("C:/users/xxxx/Downloads/", "39025968_ccccc_1009.csv")

def readFile(loc :String,filenm :String): Unit ={

  var flnm = fromFile(s"$loc$filenm") // Imported fromFile package

  println("Files testing")
  /*for (line <- flnm.getLines()) {
    printf("%4d %s\n", line.length, line)
  }*/
  flnm.getLines().foreach(println) // getLines() is imported from readLines.
  flnm.close() 
}
jwvh
  • 50,871
  • 7
  • 38
  • 64
Kareek
  • 1
  • 4
    With a question this old (asked over 9 years ago), and with so many answers already submitted, it is helpful to point out how your new answer is different from the previous answers. (And including code that's been commented out just looks sloppy.) – jwvh Oct 26 '20 at 23:54
  • yeah.. the other answers clearly show a byte array being returned. this is really not clear – Alistair McIntyre Feb 05 '21 at 09:58