6

I am trying to read the contents of a URL synchronously for a simple command-line batch script in Swift. I am using cURL for simplicity's sake - I know I could use NSURLSession if I had to. I am also building this with swift build using the open-source version of Swift on OSX.

The problem is that on certain URLs, the NSTask never terminates, if stdout has been redirected to a pipe.

// This will hang, and when terminated with Ctrl-C reports "(23) Failed writing body"
import Foundation
let task = NSTask()
let pipe = NSPipe()
task.launchPath = "/usr/bin/curl"
task.arguments = ["http://trove.nla.gov.au/newspaper/page/21704647"]
task.standardOutput = pipe
task.launch()
task.waitUntilExit()

However, if you remove the pipe, or change the URL, the task succeeds.

// This will succeed - no pipe
import Foundation
let task = NSTask()
task.launchPath = "/usr/bin/curl"
task.arguments = ["http://trove.nla.gov.au/newspaper/page/21704647"]
task.launch()
task.waitUntilExit()

// This will succeed - different URL
import Foundation
let task = NSTask()
let pipe = NSPipe()
task.launchPath = "/usr/bin/curl"
task.arguments = ["http://trove.nla.gov.au/newspaper/page/21704646"]
task.standardOutput = pipe
task.launch()
task2.waitUntilExit()

Running any of the examples directly using curl from Terminal succeeds, so there is something about the interaction with NSTask, when retrieving from that specific URL (and a few others), and when a pipe is present, that is causing cURL to fail.

tobygriffin
  • 5,339
  • 4
  • 36
  • 61
  • Silly question; but are you sure you want to use "let" here and not "var". If the task produces output, does it get assigned back the the variable itself? – user3069232 Mar 15 '16 at 11:08
  • For the pipe, do you mean? Output is passed to the pipe from which it can be read. Output doesn't replace the NSPipe instance itself. – tobygriffin Mar 15 '16 at 11:44
  • OK, but I see no "var" in this code, just "let"; nothing is being passed back to anything? – user3069232 Mar 15 '16 at 12:04
  • Sure, it is a minimal example. In the production code, the contents of stdout are parsed and returned to calling context. But this example is enough to illustrate the problem - task3 hangs, while task1 and task2 successfully complete. – tobygriffin Mar 15 '16 at 19:18
  • Toby; for sure your hitting some sort of race-condition or lockout issue, have you tried moving each process to its own thread? – user3069232 Mar 16 '16 at 07:28
  • Only one NSTask is actually running. I only included the three in the code sample to demonstrate which worked and which did not. With a race condition, I would expect intermittent success and failure, not consistent success of tasks without pipes and consistent failure of tasks with pipes. Don't you agree? – tobygriffin Mar 16 '16 at 11:19
  • Are you sure it isn't related to the URL your going too? Could that be the issue? Yes, normally race-conditions are very difficult to track down cause their inconsistent, but that doesn't mean that you have with considerable skill managed to code a race-condition into your app :) And does this work on the command line? Open two terminal windows and try and run your two curls' concurrently in them. – user3069232 Mar 16 '16 at 11:43
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/106457/discussion-between-user3069232-and-tobygriffin). – user3069232 Mar 16 '16 at 11:46
  • I've encountered this same problem except with the Swift compiler (`swiftc`) and the `standardError` property. – Peter Alfvin Jan 08 '17 at 03:58

2 Answers2

9

Expanding a little on @Hod's answer: The standard output of the launched process is redirected to a pipe, but your program never reads from the other pipe end. A pipe has a limited buffer, see for example How big is the pipe buffer? which explains that the pipe buffer size on macOS is (at most) 64KB.

If the pipe buffer is full then the launched process cannot write on it anymore. If the process uses blocking I/O then a write() to the pipe will block until until at least one byte can be written. That does never happen in your case, so the process hangs and does not terminate.

The problem can occur only if the amount written to standard output exceeds the pipe buffer size, which explains why it happens only with certain URLs and not with others.

As a solution, you can read from the pipe, e.g. with

let data = pipe.fileHandleForReading.readDataToEndOfFile()

before waiting for the process to terminate. Another option is to use asynchronous reading, e.g. with the code from Real time NSTask output to NSTextView with Swift:

pipe.fileHandleForReading.readabilityHandler = { fh in
    let data = fh.availableData
    // process data ...
}

That would also allow to read both standard output and standard error from a process via pipes without blocking.

Community
  • 1
  • 1
Martin R
  • 529,903
  • 94
  • 1,240
  • 1,382
4

Both curl and NSPipe buffer data. Based on the error you're getting when you ctrl-c out (which indicates curl couldn't write the expected amount of data), you've got a bad interaction between these.

Try adding the -N option to curl to prevent it from buffering its output.

curl can also output progress. I don't think that's causing a problem, but you might add -s to only get the data just in case.

Hod
  • 2,236
  • 1
  • 14
  • 22
  • Note that as the contributor of the bounty, I'm interested in an explanation/solution that is focused on the general issue of how specification of a pipe interferes with the completion of the task. Per my comment, I ran into this issue with the swiftc compiler and stderr, not cURL and stdout. – Peter Alfvin Jan 08 '17 at 18:19
  • @PeterAlfvin can you post a code sample? I'm not able to reproduce the problem using stderr. – Hod Jan 08 '17 at 18:38
  • See details posted on http://stackoverflow.com/questions/41537800/random-hang-by-macos-task-process-when-specifying-stderr-pipe – Peter Alfvin Jan 08 '17 at 20:55