I've been trying to build a collection of exhaustive word lists for as many languages as possible and I ended up using LibreOffice's spell checking .dic and .aff files. The .dic file contains base forms of words and .aff contains rules to morph them. I found an existing .sh tool to combine these files into a .txt word list.
Now, because I'm doing this for many languages, I'd like to automate the process of running this tool on different .dic and .aff files in different languages. I wrote a little python script for this:
for lang in langs:
dic_path = os.path.join(lang, [filename for filename in os.listdir(lang) if filename.endswith(".dic")][0])
aff_path = os.path.join(lang, [filename for filename in os.listdir(lang) if filename.endswith(".aff")][0])
command = [os.path.join("tools", "unmunch.sh"), dic_path, aff_path]
outpath = os.path.join(lang, f"{lang}_words.txt")
with open(outpath, "w") as f:
subprocess.run(command, stdout=f, shell=True)
The problem is that the file at the outpath remains empty. In contrast, this different command does write to the desired file:
command = ["type", dic_path]
with open(outpath, "w") as f:
subprocess.run(command, stdout=f, shell=True)
After trying this I executed the tool in cmd and found that it opens a new cmd window to run. This is different to what I experienced when running it in Git Bash which I usually use. In Git Bash I used the command:
tools/unmunch.sh dutch/dutch.dic dutch/dutch.aff >dutch/dutch_words.txt
And it worked. Whilst in cmd, running:
tools\unmunch.sh dutch\dutch.dic dutch\dutch.aff >dutch\dutch_words.txt
opens a new cmd window and writes the output there, instead of to the dutch\dutch_words.txt file. I assume this is what's happening when using subprocess in python, but I have no idea how to prevent this as I'm very unfamiliar with .sh files. Can anyone help me get the output written to a desired path?