0

In the terminal, if I execute

docker run -it --rm --name consumer --link zookeeper:zookeeper --link kafka:kafka       \
debezium/kafka:1.1 watch-topic -a bankserver1.bank.holding > C:/Users/User/python_test  \
/holding_pivot.txt

The log result would be written to file holding_pivot.txt like this

WARNING: Using default BROKER_ID=1, which is valid only for non-clustered installations.
Using ZOOKEEPER_CONNECT=172.17.0.3:2181
Using KAFKA_ADVERTISED_LISTENERS=PLAINTEXT://172.17.0.6:9092
Using KAFKA_BROKER=172.17.0.4:9092
Contents of topic bankserver1.bank.holding:
{"schema":{"type":"struct","fields":[{"type":"struct","fiel...
{"schema":{"type":"struct","fields":[{"type":"struct","file...

I want to have a python file that handle it before writing to file log (I dont want a python file that read from the log (because I have to distinguish which line is already read and handled and which line is not))

The process of the code like this

RESULT -> Python File (handle) -> writing to log file.

MAIN QUESTION: Which library of Python support it ? Can you give me a sample code ?

I saw a document like this:

docker run -it --rm --name consumer --link zookeeper:zookeeper --link kafka:kafka \
debezium/kafka:1.1 watch-topic -a bankserver1.bank.holding \
| grep --line-buffered '^{' | pipe > ./stream.py > ./holding_pivot.txt

===================== UPDATE ======================= According to @piertoni, I add this

docker run -it --rm --name consumer --link zookeeper:zookeeper --link /
kafka:kafka  debezium/kafka:1.1 watch-topic -a bankserver1.bank.holding /
| python C:/Users/User/python_test/test_pipe.py > C:/Users/User/python_test /
/holding_pivot.txt

Here is my trying test_pipe.py

#!/usr/bin/env python3 -u
# Note: the -u denotes unbuffered (i.e output straing to stdout without buffering data and then writing to stdout)

#!/usr/bin/env python3
import fileinput
import json
import os
import sys
from datetime import datetime

for line in sys.stdin:
    with open('log.txt', 'a') as wr:
        wr.write("Pipe success")

with open('log.txt', 'a') as wr:
    wr.write("Pipe success")

with fileinput.input() as f:
    for line in f:
        with open('log.txt', 'a') as wr:
            wr.write(f"Argument List: {str(line)}")

However, nothing write to the log.txt file (I only need the python file can read the result of the initial pipe).

Chau Loi
  • 1,106
  • 1
  • 14
  • 36
  • If I understand correctly you want the possibility to pipe the result to a python script? If this is correct check this answer: https://stackoverflow.com/questions/66225702/piping-to-a-python-script-from-cmd-shell/66225919#66225919 – piertoni Nov 15 '21 at 10:42
  • Thanks @piertoni but it still not working, check out my update in post. Any other suggestion ? – Chau Loi Nov 15 '21 at 11:00
  • 'it is not working': **what** is not working? What do you expect the code to do? What does it actually do? – 2e0byo Nov 15 '21 at 11:03
  • why, incidentally, are you redirecting the stdout of your python script to a file, given that your python script never actually writes anything to stdout? – 2e0byo Nov 15 '21 at 11:04
  • 1
    Am I right in thinking you want your python script to act as a *filter* for piped data? so it reads from stdin, does somethign with the line, and then writes to stdout? – 2e0byo Nov 15 '21 at 11:05
  • Hi @2e0byo I just need the python file can read the input content (logic is another story). Its not working here means nothing is written to the file log.txt. my bad will adjust it. – Chau Loi Nov 15 '21 at 11:06
  • @2e0byo Yes, that is what I want. Thanks :))) – Chau Loi Nov 15 '21 at 11:09
  • Note btw that among the odd things in your code (which seems to try the same thing three different ways?) is the fact that you open the file handle for every line in stdin. Since nothing else would be touching the file, you should open the file handle *once* and then iterate sys.stdin. Otherwise you have a needless performance overhead. – 2e0byo Nov 15 '21 at 11:18

1 Answers1

1

This really is exactly the same case as the linked question but to make things completely explicit, here's how you would write a stream filter using python:

You need to read from stdin, process and then write to stdout. The former can be done any number of ways but the easiest is to use fileinput. The latter can also be done any number of ways, but I personally would just use print with an explicit sys.stdout.flush() if need be. Thus putting it all together:

import fileinput
import sys

with fileinput.input() as f:
    for line in f:
        line = line.title() # filter your lines here
        print(line, end="")
        sys.stdout.flush()
dir | python test.py # try this first, to check it works
docker_cmd | python test.py # try this second
docker_cmd | python test.py > log.txt # send the output to the log.

Note that either you send stdout to a file with shell redirection or you handle the file writing inside python (unless you plan on printing different stuff to stdout than you write to the file, which doesn't seem to be the case).

Buffering

Buffering can mean two things: the terminal buffering streams, and python's own buffering. -u (at least in python 3.9) doesn't unbuffer stdin. Some techniques for unbuffering are discussed in this question. Running with strace confirms that you really can unbuffer the input (at least on my system) so something like this ought to avoid all python's buffering:

import sys

while line x:= sys.stdin.buffer.raw.readline():
    print("line".decode())

This is slower on my system than just iterating stdin, presumably because of the decode. You can try:

import os
u_stdin = os.fopen(sys.stdin.fileno(), "r", buffering=1)

To get you a line-buffered text stdin stream. Apparently the bug with .next which broke iterating the stdin object has been fixed. But none of this is any good if the terminal is buffering the pipe. On linux I can unbuffer. This question proposes a way round tha which may get things working for you.

If you are able to provide a reproducibly buffered command I could test further. I can get a buffered stream with man ssh | or the like (or tail -f whilst appending to a file), but am unable to reproduce failing to read it.

Lastly, I assume your docker cmd really is outputting to stdout? Not, say, to stderr? What happens if you add 2>&1 to your docker command to send stderr to stdout? Can you set up a docker command which runs to completion in a few seconds so you can test that python never reads anything? Buffering should slow reads down, but it should not stop data getting through.

2e0byo
  • 5,305
  • 1
  • 6
  • 26
  • Tks for your contribution, but there is nothing written in the log. The sample code you provide works well. I think the problem comes from the fact that the output from docker is streaming (keep giving result if anything affects), not 1 time output. – Chau Loi Nov 15 '21 at 14:25
  • @ChauLoi ah; sorry. perhaps it *is* a buffering problem then. Can you reproduce it without docker? do you have `unbuffer` available to force the stream not to be buffered (in case the terminal is doing it?). I'll also update with notes re. buffering – 2e0byo Nov 15 '21 at 16:24
  • @ChauLoi also can I confirm the *second* command doesn't work as well, i.e. it doesn't print anything? – 2e0byo Nov 15 '21 at 16:25