I have a 120k lines file. Each line has to be processed by an external application. I start a subprocess and send each line to stdin. It takes at least a second to start the application and that's a real bottleneck.
I am looking for a way to make it so I can start the process once and send data to it line by line.
My current code:
#not pictured: the loop that iterates over all lines. Here the text var is the line I need to pass to the application
pdebug("Sending to tomita:\n----\n", text,"\n----")
try:
p = Popen(['tomita/tomitaparser.exe', "tomita/config.proto"], stdout=PIPE, stdin=PIPE, stderr=PIPE)
stdout_data, stderr_data = p.communicate(input=bytes(text, 'UTF-8'), timeout=45)
pdebug("Tomita returned stderr:\n", "stderr: "+stderr_data.decode("utf-8").strip()+"\n" )
except TimeoutExpired:
p.kill()
pdebug("Tomita killed")
stdout_data = stdout_data.decode("utf-8")
facts = parse_tomita_output(stdout_data)
pdebug('Received facts:\n----\n',str(facts),"\n----")
The code I tried recently:
try:
p = Popen(['tomita/tomitaparser.exe', "tomita/config.proto"], stdout=PIPE, stdin=PIPE, stderr=PIPE)
for news_line in news:
pdebug("Sending to tomita:\n----\n", news_line.text,"\n----")
stdout_data, stderr_data = p.communicate(input=bytes(news_line.text, 'UTF-8'), timeout=45)
pdebug("Tomita returned stderr:\n",stderr_data.decode("utf-8").strip()+"\n" )
stdout_data = stdout_data.decode("utf-8")
facts = parse_tomita_output(stdout_data)
pdebug('Received facts:\n----\n',str(facts),"\n----")
news_line.grammemes = facts
except TimeoutExpired:
p.kill()
pdebug("Tomita killed due to timeout")
The recent code produces this error:
ValueError: Cannot send input after starting communication
So is there a way to send input after I launch the exe, read stdout, flush stdin and stdout, repeat the process?