Running and calling into a Python program as a persistent subprocess

Question

I am writing a microservice in Haskell and it seems that we'll need to call into a Python library. I know how to create and configure a process to do that from Haskell, but my Python is rusty. Here's the logic I am trying to implement:

The Haskell application initializes by creating a persistent subprocess (lifetime of the subprocess = lifetime of the parent process) running a minimized application serving the Python library.
The Haskell application receives a network request and sends over stdin exactly 1 chunk of data (i.e. bytestring or text) to the Python subprocess; it waits for -- blocking -- exactly 1 chunk of data to be received from the subprocess' stdout, collects the result and returns it as a response.

I've looked around and the closest solution I was able to find where:

Both handle only the part I know how to handle (i.e. calling into a Python subrocess) while not dealing with the details of the Python code run from the subprocess -- hence this question.

The obvious alternative would be to simply create, run and stop a subprocess whenever the Haskell application needs it, but the overhead is unpleasant.

I've tried something whose minimized version looks like:

-- From the Haskell parent process
{-# LANGUAGE OverloadedStrings #-}

import           System.IO
import           System.Process.Typed

configProc :: ProcessConfig Handle Handle ()
configProc =
    setStdin createPipe $
    setStdout createPipe $
    setStderr closed $
    setWorkingDir "/working/directory" $
    shell "python3 my_program.py"

startPyProc :: IO (Process Handle Handle ())
startPyProc = do
    p <- startProcess configProc
    hSetBuffering (getStdin p) NoBuffering
    hSetBuffering (getStdout p) NoBuffering
    pure p

main :: IO ()
main = do
    p <- startPyProc
    let stdin = getStdin p
        stdout = getStdout p
    hSetBuffering stdin NoBuffering
    hSetBuffering stdout NoBuffering
    -- hGetLine won't get anything before I call hClose
    -- making it impossible to stream over both stdin and stout
    hPutStrLn stdin "foo" >> hClose stdin >> hGetLine stdout >>= print

# From the Python child process
import sys

if '__name__' == '__main__':
    for line in sys.stdin:
        # do some work and finally...
        print(result)

One issue with this code is that I have not been able to send to sdin and receive from stdout without first closing the stdin handle, which makes the implementation unable to do what I want (send 1 chunk to stdin, block, read the result from stout, rinse and repeat). Another potential issue is that the Python code might not adequate at all for the specification I am trying to meet.

The fact that your other program is written in Python isn't really relevant. — chepner, Jun 11 '21 at 00:09
It deadlocks in my tests, but I'd like to focus the question on the approach -- hopefully with examples -- that is recommended to meet the specification I've detailed. — WhyNotTryCalmer, Jun 11 '21 at 00:11
"The fact that your other program is written in Python isn't really relevant." The parent process is not written in Python -- it is written in Haskell. But I don't really understand the point of your comment. I am simply trying to have the Python code continuously (and forever, basically as a server) acquire chunks of data through stdin, process them and send them back over stdout, with the parent process assuming the entire responsibility of when to stop the process. — WhyNotTryCalmer, Jun 11 '21 at 00:16
And it will. Haskell will start a process, write something to its standard input when it needs to, and wait for something written to its standard output. Once it is done, your Haskell program closes its end of the pipe the process uses for standard input, and your program is written so that the loop terminates when `sys.stdin` is closed. — chepner, Jun 11 '21 at 01:13
you would have to create minimal working code in Python and Haskell so we could test it. — furas, Jun 11 '21 at 01:46
Don't know any python, but I bet `for line in sys.stdin` is waiting for a line to be written to stdin. — pedrofurla, Jun 11 '21 at 04:27
@furas "your program is written so that the loop terminates when `sys.stdin` is closed". That's helpful, thanks! Will try to improve thanks to your hint. — WhyNotTryCalmer, Jun 11 '21 at 06:54
Can you clarify which side of the setup you are having problems with? Most Python people cannot help you with the Haskell part, and I assume vice versa it's true as well. The last sentence implies the issue is with Python, but the rest of the picture veers from one to the other. — MisterMiyagi, Jun 11 '21 at 07:07
@MisterMiyagi Sure. I am having a problem with the last line of the Haskell snippet. My belief is that the problem is caused by the Python implementation. (In point of fact, chepner's last comment suggests there is an incompatibility between the two snippets, in that the Haskell code needs to close stdin to be able to read stdout while the Python code needs to have stdin open to be able to run.) If my belief is wrong, I'll need to try and understand how to stream in both directions between processes without closing the stdin handle. — WhyNotTryCalmer, Jun 11 '21 at 07:51

score 0 · Accepted Answer · answered Jun 16 '21 at 21:12

0

Got it fixed by simply replacing print(...) with print(..., flush=True). It appears that in Python stdin/stdout default to block-buffering, which made my call to hGetLine block since it was expecting lines.

answered Jun 16 '21 at 21:12

WhyNotTryCalmer

357
5
15

Running and calling into a Python program as a persistent subprocess

1 Answers1