2

I have checked java.nio.file.Files.copy but that blocks a thread until the copy is done. Are there any libraries that allow one to copy a file in a non-blocking way? I need to perform many of these operations simultaneously and cannot afford to have so many threads blocked.

While I could write something myself using non-blocking streams, I would rather use something tried and tested that would guarantee a correct copy every time (or detect if something went wrong).

Eduardo
  • 8,362
  • 6
  • 38
  • 72
  • You could use `scala.sys.process` in the standard library to invoke `cp` (or whatever the copy utility on you platform is) directly. To be honest though, I don't think you really want to do concurrent copies. The underlying hardware, regardless of whether it is an SSD or HDD, does not provide any parallelism, and in the case of hard disk drives trying to do copies in parallel will lead to lower performance because of the increased number of seeks. What are you really trying to do? – wingedsubmariner Oct 30 '13 at 13:55
  • What kind of storage are you writing these files to? If writing to a single hard disk, consider that running more than a small number at a time will probably degrade performance and fragment disk. – Ed Staub Oct 30 '13 at 13:56
  • @wingedsubmariner: the files are hosted in a Windows file server (I do not know the specific underlying technology) – Eduardo Oct 30 '13 at 13:57
  • @EdStaub: the files are hosted in a Windows file server (I do not know the specific underlying technology) – Eduardo Oct 30 '13 at 13:58
  • 1
    Well, then you are dealing with network delay, so you might legitimately get a performance gain, though the comments on disk performance still stand. I don't have enough Windows experience to help you, but with Linux we would use ssh+tar or rsync in order to stream all the files at once. In my experience, network mounted filesystems have terrible, terrible performance, as compared to directly streaming files. – wingedsubmariner Oct 30 '13 at 14:00
  • So, I'd suggest doing performance testing to see how many concurrent files make sense. Trying to predicting likely performance is nearly pointless, but if pressed I'd guess that the best number will be between 3 and 5. – Ed Staub Oct 30 '13 at 14:02
  • @wingedsubmariner: actually, the source of the files would be in the same file system. I guess the only thing traveling over the network will be the copy command – Eduardo Oct 30 '13 at 14:02
  • If I understand you correctly, you are copying files inside of the Windows file server, but from a client machine? Forcing all the data to roundtrip to the client? Anyway you can get code running on the server directly? Or perhaps the underlying protocol has a file copy command? – wingedsubmariner Oct 30 '13 at 14:06
  • There is a file copy command. I would assume there is no need for this to travel back and forth to the client. – Eduardo Oct 30 '13 at 14:23
  • @wingedsubmariner: I realize that probably the only way to do this asynchronously, and on the remote filesystem's machine, is to use the Process / ProcessBuilders classes. Thanks to both of you for the feedback. – Eduardo Oct 30 '13 at 14:28
  • @EdStaub: : I realize that probably the only way to do this asynchronously, and on the remote filesystem's machine, is to use the Process / ProcessBuilders classes. Thanks to both of you for the feedback. – Eduardo Oct 30 '13 at 14:29
  • @Eduardo you can use multiple "@subject1,"@subject2" to avoid duplicating your messages – maasg Oct 30 '13 at 14:58
  • @Eduardo, after digging through NIO, I see that it used Windows CopyFileEx, which will do the remote copy without roundtripping the data, just as you describe. – Ed Staub Oct 30 '13 at 14:59

3 Answers3

0

Check this: Iterate over lines in a file in parallel (Scala)?

val chunkSize = 128 * 1024
val iterator = Source.fromFile(path).getLines.grouped(chunkSize)
iterator.foreach { lines => 
    lines.par.foreach { line => process(line) }
}

Reading (copying) files by chunks in parallel. In this case "par" is used.

So it quite non-blocking in terms / scope of processors (cores).

But you may follow same idea of chunks, for example using Akka/Future/Promises to be even in wider scopes.

You may customize you chunk-size deepening on your performance characteristic, level of system load, etc..

One more link that explains possible way to do read / write data from (property) file in parallel using Akka Actors. This is not quite that you might be want, but it may give an idea.

Idea - you may build your own not-blocking way of reading / copying files.

--

And about your statement "While I could write something myself using non-blocking streams":

I would remind that each OS / File System (FS) may have its own vision about what and where to block. Like Windows blocks a file (write-block at leat) if one thread writes to it. On Linux is is configurable. So if you want to stick to something stable, I would suggest to think it out and go with your own wrapper (over FS) solution based on events, chunks, states.

Community
  • 1
  • 1
ses
  • 13,174
  • 31
  • 123
  • 226
0

I have used the Process class, issuing an operating system command to copy the file. Of course, one has to check under which OS the application is running, and issue the appropriate command, but this allows for fast and asynchronous copies.

As Marius rightly mentions in the comments, Scala Process blocks, so I run it wrapped in a Future.

Java 8 Process introduces a function isAlive(). A non-blocking alternative would be to use Java 8 processes and use the scheduler to poll at regular intervals to see if the process has finished. However, I did no need to go to this extent.

Eduardo
  • 8,362
  • 6
  • 38
  • 72
  • @MariusSoutier Why? The process runs asynchronously. – Eduardo Sep 05 '14 at 19:17
  • `The methods provided on Process make it possible for one to block until the process exits and get the exit value [...] Presently, one cannot poll the Process to see if it has finished.`. – Marius Soutier Sep 05 '14 at 21:33
  • @MariusSoutier True, but I run it within a Future. I found it more reliable to let the file system make the copy, rather than building my own in Scala / Java. – Eduardo Sep 06 '14 at 04:23
  • Yes inside a future and a separate execution context. You should add all that to the answer. – Marius Soutier Sep 06 '14 at 06:35
-1

Have you checked out the async stuff in scala-io? http://jesseeichar.github.io/scala-io-doc/0.4.2/index.html#!/core/async%20read%20write

johanandren
  • 11,249
  • 1
  • 25
  • 30