Given a config file as such from the Moses Machine Translation Toolkit:
#########################
### MOSES CONFIG FILE ###
#########################
# input factors
[input-factors]
0
# mapping steps
[mapping]
0 T 0
[distortion-limit]
6
# feature functions
[feature]
UnknownWordPenalty
WordPenalty
PhrasePenalty
PhraseDictionaryMemory name=TranslationModel0 num-features=4 path=/home/gillin/jojomert/phrase-jojo/work.src-ref/training/model/phrase-table.gz input-factor=0 output-factor=0
LexicalReordering name=LexicalReordering0 num-features=6 type=wbe-msd-bidirectional-fe-allff input-factor=0 output-factor=0 path=/home/gillin/jojomert/phrase-jojo/work.src-ref/training/model/reordering-table.wbe-msd-bidirectional-fe.gz
Distortion
KENLM lazyken=0 name=LM0 factor=0 path=/home/gillin/jojomert/ru.kenlm order=5
# dense weights for feature functions
[weight]
UnknownWordPenalty0= 1
WordPenalty0= -1
PhrasePenalty0= 0.2
TranslationModel0= 0.2 0.2 0.2 0.2
LexicalReordering0= 0.3 0.3 0.3 0.3 0.3 0.3
Distortion0= 0.3
LM0= 0.5
I need to read the parameters from the [weights]
section:
UnknownWordPenalty0= 1
WordPenalty0= -1
PhrasePenalty0= 0.2
TranslationModel0= 0.2 0.2 0.2 0.2
LexicalReordering0= 0.3 0.3 0.3 0.3 0.3 0.3
Distortion0= 0.3
LM0= 0.5
I have been doing it as such:
def read_params_from_moses_ini(mosesinifile):
parameters_string = ""
for line in reversed(open(mosesinifile, 'r').readlines()):
if line.startswith('[weight]'):
return parameters_string
else:
parameters_string+=line.strip() + ' '
to get this output:
LM0= 0.5 Distortion0= 0.3 LexicalReordering0= 0.3 0.3 0.3 0.3 0.3 0.3 TranslationModel0= 0.2 0.2 0.2 0.2 PhrasePenalty0= 0.2 WordPenalty0= -1 UnknownWordPenalty0= 1
Then using parsing the output with
moses_param_pattern = re.compile(r'''([^\s=]+)=\s*((?:[^\s=]+(?:\s|$))*)''')
def parse_parameters(parameters_string):
return dict((k, list(map(float, v.split())))
for k, v in moses_param_pattern.findall(parameters_string))
mosesinifile = 'mertfiles/moses.ini'
print (parse_parameters(read_params_from_moses_ini(mosesinifile)))
to get:
{'UnknownWordPenalty0': [1.0], 'PhrasePenalty0': [0.2], 'WordPenalty0': [-1.0], 'Distortion0': [0.3], 'LexicalReordering0': [0.3, 0.3, 0.3, 0.3, 0.3, 0.3], 'TranslationModel0': [0.2, 0.2, 0.2, 0.2], 'LM0': [0.5]}
The current solution involve some crazy reversal line reading from the config file and then pretty complicated regex reading to get the parameters.
Is there a simpler or less hacky/verbose way to read the file and achieve the desired parameter dictionary output?
Is it possible to change the configparser such that it reads the moses config file? It's pretty hard because it has some erroneous section that are actually parameters, e.g. [distortion-limit]
and there's no key to the value 6
. In a validated configparse-able file, it would have been distortion-limit = 6
.
Note: The native python configparser
is unable to handle a moses.ini
config file. Answers from How to read and write INI file with Python3? will not work.