0

From a C program, I'm trying to fork off a child process to run a Python script, which should process lines from the parent on its standard input and return results back to parent on standard output, i.e., the parent will feed input lines and read results from the child standard output.

Only both parent and child are just stuck on reading their first line of data from one another.

Why?

Here's a sample run, starting C-parent:

As root:

./pycalc

Output:

dude
Got this to do calc with: dude
^CTraceback (most recent call last):
  File "./pycalc.py", line 9, in <module>
    line = raw_input()
KeyboardInterrupt

Here's the sample python script:

#!/usr/bin/python

import base64
import os

line = ''
while True:
  try:
    line = raw_input()
  except EOFError:
    break

  print(base64.encodestring(line))

print('Bye - End of Calc!')

And here's the parent C source:

#define PARENT_READ  read_pipe[0]
#define PARENT_WRITE write_pipe[1]
#define CHILD_WRITE  read_pipe[1]
#define CHILD_READ   write_pipe[0]

int main(int argc, char *argv[])
{
  int            pid;
  int            read_pipe[2];
  int            write_pipe[2];
  FILE          *wrp = (FILE *)NULL;
  FILE          *rdp = (FILE *)NULL;

  char           line[1024+1];

  if (pipe(read_pipe)  < 0)
    exit(-1);
  if (pipe(write_pipe) < 0)
    exit(-1);

  if ((pid = fork()) < 0)
    exit(-1);
  else if (pid == 0) {
    // Child process go here
    close(PARENT_READ);
    close(PARENT_WRITE);

    dup2(CHILD_READ, STDIN_FILENO);
    dup2(CHILD_WRITE, STDOUT_FILENO);
    close(CHILD_READ);
    close(CHILD_WRITE);
    setlinebuf(stdin);
    setlinebuf(stdout);
    const char *argv[] = {"pycalc.py", NULL};
    execvp("pycalc.py", (char **)argv);
    exit(errno);
  }

  // Parent process go here
  close(CHILD_READ);
  close(CHILD_WRITE);

  if ((wrp = fdopen(PARENT_WRITE, "w")) == NULL)
    exit(-1);
  setlinebuf(wrp);

  if ((rdp = fdopen(PARENT_READ, "r")) == NULL)
    exit(-1);
  setlinebuf(rdp);

  while (fgets(line, sizeof(line), stdin) != NULL) {
    printf("Got this to do calc with: %s", line);
    fprintf(wrp, "%s", line);
    fflush(wrp);
    fgets(line, sizeof(line), rdp);
    printf("Got this calc from child: %s", line);
  }

  exit(0);
}
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
  • By the way it should be `exit(EXIT_FAILURE)` not `exit(-1)`. – John Zwinck Sep 24 '18 at 10:58
  • 1
    Any specific reason why to use fork, instead of [popen](https://linux.die.net/man/3/popen)? – hellow Sep 24 '18 at 11:01
  • 2
    It looks like both parent and child try to read first, in blocking fashion. Naturally this will deadlock. No? – John Zwinck Sep 24 '18 at 11:01
  • 1
    `setlinebuf` in child before `execvp` have no effect. – Maxim Egorushkin Sep 24 '18 at 11:19
  • 1
    @hellow : you can only do I/O in one direction with `popen()`. This needs bidirectional communication. – Jonathan Leffler Sep 24 '18 at 12:14
  • What happens if you use something like `printf ‘%s\n’ dude how are you | ./pycalc.py | cat`? This gives your script piped input and output; does it produce 4 base-64 encoded lines? Or does it produce just one? – Jonathan Leffler Sep 24 '18 at 12:27
  • @JonathanLeffler it behaves at expected: # echo 'dude how are you' | ./pycalc.py | cat - ZHVkZSBob3cgYXJlIHlvdQ== Bye - End of Calc! And returns just one line – Steffen Winther Sørensen Sep 24 '18 at 12:31
  • No just one base64 line, the Bye is exit greeting – Steffen Winther Sørensen Sep 24 '18 at 12:33
  • @MaximEgorushkin I know, but I figured just in case that changing to line buffering :) – Steffen Winther Sørensen Sep 24 '18 at 12:34
  • Ok. So the same is going to happen with your program. It will wait until the input is complete. You got impatient and hit interrupt (control-C); if you indicated EOF with control-D instead, you’d probably get the base-64 encoding back. – Jonathan Leffler Sep 24 '18 at 12:35
  • @hellow You're right, man page now says: 'popen() is now implemented using a bidirectional pipe' maybe I should try this... – Steffen Winther Sørensen Sep 24 '18 at 12:37
  • @JonathanLeffler no raw_input should return at newline chars in input stream also if I run pycacl.py alone it writes result after every input line given – Steffen Winther Sørensen Sep 24 '18 at 12:39
  • @JonathanLeffler using printf still only returns one base64 line as expected due to just one newline char in input – Steffen Winther Sørensen Sep 24 '18 at 12:41
  • @hellow popen fails to open bidirectional it seems: 37 FILE *pop = (FILE *)NULL; (gdb) 71 if ( (pop = popen("./pycalc.py","r+")) == (FILE *)NULL ) exit(-1); (gdb) Program exited with code 0377. – Steffen Winther Sørensen Sep 24 '18 at 12:51
  • Bash on my Mac screws up when executing `printf '%s\n' dude how are you` as copied from my comment (I'm sorry; I typed it on an iPhone in the app). It used smart quotes on me, dammit, instead of simple single quotes I intended to type. The version in this comment has single quotes — I typed it in a browser on my Mac, not the app. There are times, and this is one of them, when I hate spell-manglers and 'smart phones' which aren't smart enough to understand when I'm typing code, not just text. – Jonathan Leffler Sep 24 '18 at 12:57

1 Answers1

1

I've done enough experimentation with some command line tools of my own devising to convince me that the problem is the way Python works in this piped context. It doesn't process the data until it gets EOF (because the data fed isn't big enough to fill buffers). So, despite the best efforts of raw_input and attempts to set line buffering before Python is executed, it isn't reading data as soon as lines are available. My test pipeline has 6 processes in a row:

timecmd -m -- cat -u | tstamp -f3 | tee /dev/tty | timecmd -m -- pycalc.py |
tstamp -f3 | tee $(isodate -c).log

The timecmd command reports the start time of the command, runs the command, and reports the exit time (-m for millisecond times). The tstamp -f3 prefixes each line of output with a timestamp (-f3 for millisecond times). Using cat -u does unbuffered output; it reports the lines as they're available; the tee /dev/tty shows the timestamped output on the terminal and feeds it to pycalc.py; the output is again timestamped, and then piped to tee for logging (the combination with isodate -c generates a log file name such as 20180924.072626.log). When I use this, I see clearly that pycalc.py doesn't do a thing until I indicate EOF to the cat -u.

$ timecmd -m -- cat -u | tstamp -f3 | tee /dev/tty | timecmd -m -- pycalc.py |
> tstamp -f3 | tee $(isodate -c).log
2018-09-24 07:29:06.419 [PID 44960] cat -u
2018-09-24 07:29:06.419 [PID 44961] pycalc.py
dude
2018-09-24 07:29:09.274: dude
how
2018-09-24 07:29:10.819: how
are
2018-09-24 07:29:12.349: are
you
2018-09-24 07:29:16.540: you
2018-09-24 07:29:26.205 [PID 44960; status 0x0000]  -  19.786s
2018-09-24 07:29:26.209 [PID 44961; status 0x0000]  -  19.789s
2018-09-24 07:29:26.208: MjAxOC0wOS0yNCAwNzoyOTowOS4yNzQ6IGR1ZGU=
2018-09-24 07:29:26.209:
2018-09-24 07:29:26.209: MjAxOC0wOS0yNCAwNzoyOToxMC44MTk6IGhvdw==
2018-09-24 07:29:26.209:
2018-09-24 07:29:26.209: MjAxOC0wOS0yNCAwNzoyOToxMi4zNDk6IGFyZQ==
2018-09-24 07:29:26.209:
2018-09-24 07:29:26.209: MjAxOC0wOS0yNCAwNzoyOToxNi41NDA6IHlvdQ==
2018-09-24 07:29:26.209:
2018-09-24 07:29:26.209: Bye - End of Calc!

So, AFAICT, you need to find a way to make Python read lines from standard input when standard input comes from a pipe — and to read them before the buffer is full. Until then, you won't be successful. You may have similar problems with ensuring the output is flushed properly (timely) too. You could further refine the testing I did by timestamping the input Python sees, maybe reporting the timestamped data to standard error (so you know what Python is doing), but I'm moderately convinced that my setup sends the data to Python timely but Python is not recognizing that.

Searching with Google on 'python line buffered standard input' comes up with Setting smaller buffer size for sys.stdin which suggests using python -u. When I use timecmd -m -- python -u pycalc.py in my long command line, then I get the expected output — Python responds as the data is generated.

$ timecmd -m -- cat -u | tstamp -f3 | tee /dev/tty | timecmd -m -- python -u pycalc.py |
> tstamp -f3 | tee $(isodate -c).log
2018-09-24 07:52:41.485 [PID 45180] cat -u
2018-09-24 07:52:41.485 [PID 45181] python -u pycalc.py
dude
2018-09-24 07:52:43.213: dude
2018-09-24 07:52:43.214: MjAxOC0wOS0yNCAwNzo1Mjo0My4yMTM6IGR1ZGU=
2018-09-24 07:52:43.215:
how
2018-09-24 07:52:48.852: how
2018-09-24 07:52:48.852: MjAxOC0wOS0yNCAwNzo1Mjo0OC44NTI6IGhvdw==
2018-09-24 07:52:48.852:
are
2018-09-24 07:52:50.720: are
2018-09-24 07:52:50.720: MjAxOC0wOS0yNCAwNzo1Mjo1MC43MjA6IGFyZQ==
2018-09-24 07:52:50.720:
you
2018-09-24 07:52:52.479: you
2018-09-24 07:52:52.479: MjAxOC0wOS0yNCAwNzo1Mjo1Mi40Nzk6IHlvdQ==
2018-09-24 07:52:52.479:
2018-09-24 07:52:53.646 [PID 45180; status 0x0000]  -  12.161s
2018-09-24 07:52:53.647: Bye - End of Calc!
2018-09-24 07:52:53.650 [PID 45181; status 0x0000]  -  12.165s

You can see from the timestamps how this is reacting as the typing (deliberately slow typing) occurs. Translating this into what your program needs to do, I come up with:

#include <errno.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>

#define PARENT_READ read_pipe[0]
#define PARENT_WRITE write_pipe[1]
#define CHILD_WRITE read_pipe[1]
#define CHILD_READ  write_pipe[0]

int main(void)
{
    int pid;
    int read_pipe[2];
    int write_pipe[2];

    if (pipe(read_pipe) < 0)
        exit(-1);
    if (pipe(write_pipe) < 0)
        exit(-1);

    if ((pid = fork()) < 0)
        exit(-1);
    else if (pid == 0)
    {
        // Child process go here
        close(PARENT_READ);
        close(PARENT_WRITE);

        dup2(CHILD_READ, STDIN_FILENO);
        dup2(CHILD_WRITE, STDOUT_FILENO);
        close(CHILD_READ);
        close(CHILD_WRITE);
        char *argv[] = {"python", "-u", "pycalc.py", NULL};
        execvp(argv[0], argv);
        fprintf(stderr, "failed to execute %s (%d: %s)\n", argv[0], errno, strerror(errno));
        exit(errno);
    }

    // Parent process go here
    close(CHILD_READ);
    close(CHILD_WRITE);
    FILE *wrp = fdopen(PARENT_WRITE, "w");
    FILE *rdp = fdopen(PARENT_READ, "r");
    if (wrp == NULL || rdp == NULL)
        exit(-1);
    setlinebuf(wrp);
    setlinebuf(rdp);

    char line[1024 + 1];
    while (fgets(line, sizeof(line), stdin) != NULL)
    {
        printf("Got this to do calc with: %s", line);
        fprintf(wrp, "%s", line);
        fflush(wrp);
        fgets(line, sizeof(line), rdp);
        printf("Got this calc from child: %s", line);
    }

    return(0);
}

And when run (the program is pipe43), I get:

$ pipe43
dude
Got this to do calc with: dude
Got this calc from child: ZHVkZQ==
how
Got this to do calc with: how
Got this calc from child:
are
Got this to do calc with: are
Got this calc from child: aG93
you
Got this to do calc with: you
Got this calc from child:
Traceback (most recent call last):
  File "pycalc.py", line 15, in <module>
    print('Bye - End of Calc!')
IOError: [Errno 32] Broken pipe

The error from Python is because it tries to write after the parent process has closed its pipe — it doesn't wait for stray output from the Python.

I'm not sure of the efficiency implications of the -u option to Python; I suspect they're not great.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
  • Python 2.7 buffering is different from Python 3.x buffering. If nothing else works, use `os.read(f.fileno(), 8192)`, that returns as soon as there is input data available. FYI `sys.stdin.fileno()` is 0. – pts May 06 '23 at 19:13