7

Im looking at a dictionary file (".dic") and its associated "aff" file. What I'm trying to do is combine the rules in the "aff" file with the words in the "dic" file to create a global list of all words contained within the dictionary file.

The documentation behind these files is difficult to find. Does anyone know of a resource that I can learn from?

Is there any code out there that will already do this (am I duplicating an effort that I don't need to)?

thanks!

wordless
  • 79
  • 1
  • 3

4 Answers4

5

According to Pillowcase, here it's an example of usage:

# Download dictionary
wget -O ./dic/es_ES.aff "https://raw.githubusercontent.com/sbosio/rla-es/master/source-code/hispalabras-0.1/hispalabras/es_ES.aff"
wget -O ./dic/es_ES.dic "https://raw.githubusercontent.com/sbosio/rla-es/master/source-code/hispalabras-0.1/hispalabras/es_ES.dic"

# Compile program
wget -O ./dic/unmunch.cxx "https://raw.githubusercontent.com/hunspell/hunspell/master/src/tools/unmunch.cxx"
wget -O ./dic/unmunch.h "https://raw.githubusercontent.com/hunspell/hunspell/master/src/tools/unmunch.h"
g++ -o ./dic/unmunch ./dic/unmunch.cxx

# Generate dictionary
./dic/unmunch ./dic/es_ES.dic ./dic/es_ES.aff 2> /dev/null > ./dic/es_ES.txt.bk
sort ./dic/es_ES.txt.bk > ./dic/es_ES.txt # Opcional
rm ./dic/es_ES.txt.bk # Opcional
Rubén Morales
  • 332
  • 4
  • 6
  • 1
    Great! I share the generated file (it's unsorted) for those whom want a **unmunched spanish dictionary** and have not access to a linux terminal: [es_ES.txt](https://www.4shared.com/office/PL_9bh_diq/es_ES.html) – Leopoldo Sanczyk Jun 15 '21 at 01:41
2

These could be Hunspell dictionary files. Unfortunately, the command to create a "global" or unmunched wordlist only fully support simple .aff and .dic files.

From the documentation.

unmunch: list all recognized words of a MySpell dictionary

Syntax:

unmunch dic_file affix_file

Try it and see what happens. For generating all wordforms for one word only, look here.

Community
  • 1
  • 1
Pillowcase
  • 684
  • 6
  • 7
2

You need a utility called munch.exe to apply the aff rules to the dic file.

Mode
  • 123
  • 2
  • 10
0

As other answers have pointed out, the command-line program unmunch, from Hunspell, supposedly does what you want. However this program is outdated and very buggy. See this answer for more detail and alternatives.

Maëlan
  • 3,586
  • 1
  • 15
  • 35