I´m working on a Python program supposed to read incoming MS-Word documents in a client/server fashion, i.e. the client sends a request (one or multiple MS-Word documents) and the server reads specific content from those requests using pythoncom and win32com.
Because I want to minimize waiting time for the client (client needs a status message from server, I do not want to open an MS-Word instance for every request. Hence, I intend to have a pool of running MS-Word instances from which the server can pick and choose. This, in turn, means I have to reuse those instances from the pool in different threads and this is what causes trouble right now. After I fixed the following error I asked previously on stack overflow, my code looks now like this:
import pythoncom, win32com.client, threading, psutil, os, queue, time, datetime
class WordInstance:
def __init__(self,app):
self.app = app
self.flag = True
appPool = {'WINWORD.EXE': queue.Queue()}
def initAppPool():
global appPool
wordApp = win32com.client.DispatchEx('Word.Application')
appPool["WINWORD.EXE"].put(wordApp) # For testing purpose I only use one MS-Word instance currently
def run_in_thread(instance,appid, path):
print(f"[{datetime.now()}] open doc ... {threading.current_thread().name}")
pythoncom.CoInitialize()
wordApp = win32com.client.Dispatch(pythoncom.CoGetInterfaceAndReleaseStream(appid, pythoncom.IID_IDispatch))
doc = wordApp.Documents.Open(path)
doc.SaveAs(rf'{path}.FB.pdf', FileFormat=17)
doc.Close()
print(f"[{datetime.now()}] close doc ... {threading.current_thread().name}")
instance.flag = True
if __name__ == '__main__':
initAppPool()
pathOfFile2BeRead1 = r'C:\Temp\file4.docx'
pathOfFile2BeRead2 = r'C:\Temp\file5.docx'
#treat first request
wordApp = appPool["WINWORD.EXE"].get(True, 10)
wordApp.flag = False
pythoncom.CoInitialize()
wordApp_id = pythoncom.CoMarshalInterThreadInterfaceInStream(pythoncom.IID_IDispatch, wordApp.app)
readDocjob1 = threading.Thread(target=run_in_thread,args=(wordApp,wordApp_id,pathOfFile2BeRead1), daemon=True)
readDocjob1.start()
appPool["WINWORD.EXE"].put(wordApp)
#wait here until readDocjob1 is done
wait = True
while wait:
try:
wordApp = appPool["WINWORD.EXE"].get(True, 1)
if wordApp.flag:
print(f"[{datetime.now()}] ok appPool extracted")
wait = False
else:
appPool["WINWORD.EXE"].put(wordApp)
except queue.Empty:
print(f"[{datetime.datetime.now()}] error: appPool empty")
except BaseException as err:
print(f"[{datetime.datetime.now()}] error: {err}")
wordApp.flag = False
openDocjob2 = threading.Thread(target=run_in_thread,args=(wordApp,wordApp_id,pathOfFile2BeRead2), daemon=True)
openDocjob2.start()
When I run the script I receive the following output printed on the terminal:
[2022-03-29 11:41:08.217678] open doc ... Thread-1
[2022-03-29 11:41:10.085999] close doc ... Thread-1
[2022-03-29 11:41:10.085999] ok appPool extracted
[2022-03-29 11:41:10.085999] open doc ... Thread-2
Process finished with exit code 0
And only the first word file is converted to a pdf. It seems like def run_in_thread
terminates after the print statement and before/during pythoncom.CoInitialize()
. Sadly I do not receive any error message which makes it quite hard to understand the cause of this behavior.
After reading into Microsofts documentation I tried using
pythoncom.CoInitializeEx(pythoncom.APARTMENTTHREADED)
instead of pythoncom.CoInitialize()
. Since my COM object needs to be called by multiple threads. However this changed nothing.