I just got a giant 1.4m line dictionary for other programming uses, and i'm sad to see notepad++ is not powerful enough to do the parsing job to the problem. The dictionary contains three types of lines:
<ar><k>-aaltoiseen</k>
yks.ill..ks. <kref>-aaltoinen</kref></ar>
yks.nom. -aaltoinen; yks.gen. -aaltoisen; yks.part. -aaltoista; yks.ill. -aaltoiseen; mon.gen. -aaltoisten -aaltoisien; mon.part. -aaltoisia; mon.ill. -aaltoisiinesim. Lyhyt-, pitkäaaltoinen.</ar>
and I want to extract every word of it to a list of words without duplicates. Lets start by my code.
f = open('dic.txt')
p = open('parsed_dic.txt', 'r+')
lines = f.readlines()
for line in lines:
#<ar><k> lines
#<kref> lines
#ending to ";" - lines
for word in listofwordsfromaline:
p.write(word,"\n")
f.close()
p.close()
Im not particulary asking you how to do this whole thing, but anything would be helpful. A link to a tutorial or one type of line parsing method would be highly appreciated.