(I am trying to update hunspell spelling dictionary) My synonym file looks something like this...
mylist="""
specimen|3
sample
prototype
example
sample|3
prototype
example
specimen
prototype|3
example
specimen
sample
example|3
specimen
sample
prototype
protoype|1
illustration
"""
The first step is to merge duplicate words. In the example mentioned above, the word "prototype" is repeated. So I will need to club it together. The count will change from 3 to 4 because the "illustration" synonym is added.
specimen|3
sample
prototype
example
sample|3
prototype
example
specimen
prototype|4
example
specimen
sample
illustration
example|3
specimen
sample
prototype
The second step is more complicated. It is not enough to merge duplicates. The added word should also be reflected to the linked words. In this case I need to search for "prototype" in synonym list and if found, the "illustration" word should get added. The final list of words will look like this...
specimen|4
sample
prototype
example
illustration
sample|4
prototype
example
specimen
illustration
prototype|4
example
specimen
sample
illustration
example|4
specimen
sample
prototype
illustration
A new word "illustration" should get added to the original list with all 4 linked words.
illustration|4
example
specimen
sample
prototype
What I have tried:
myfile=StringIO.StringIO(mylist)
for lineno, i in enumerate(myfile):
if i:
try:
if int(i.split("|")[1]) > 0:
print lineno, i.split("|")[0], int(i.split("|")[1])
except:
pass
The above code returns word with line numbers and count.
1 specimen 3
5 sample 3
9 prototype 3
13 example 3
17 protoype 1
It means I need to merge 1 word on line number 18 with the word found on line number 9 ("prototype") at 4th position. If I can do this, I will complete the step 1 of the task.