I'm a newbie with regex and am pretty sure this question has been answered somewhere, but I haven't succeeded in tweaking what I've found to do the job. I'm working with a dictionary file with repeated headwords, which cause the compiler to fail. So I need to match exact head words (all of which don't contain characters such as "[" and "<") at the beginning of a line and delete the repetitions. But there are many, many duplicate head words across the file, so I would like to replace matches automatically. Here's an example from the dictionary:
aGga
<© aGga @>
[m1]aṅgá [/m]
[m2][trn][i]pel. ¤1.¤ emphatic[/i]: just, only; especially; ¤2,¤ [i]exhortative[/i]: [i]w. voc. or impv.[/i]; ¤3.¤ [i]intr.[/i]: [/trn][/m]
[m2][trn][b]kim aṅga,[/b] how much more?[/trn][/m]
aGga
<© aGga @>
[m1]áṅga [/m]
[m2][trn][i]m. pl. No of a people and their country.[/i][/trn][/m]
Here I would need to match the identical head words ("aGga") and then delete the second, third, etc., instances (the second "aGga") as well as their following line (which happens to between < and > ["<© aGga @>"], producing this desired output:
aGga
<© aGga @>
[m1]aṅgá [/m]
[m2][trn][i]pel. ¤1.¤ emphatic[/i]: just, only; especially; ¤2,¤ [i]exhortative[/i]: [i]w. voc. or impv.[/i]; ¤3.¤ [i]intr.[/i]: [/trn][/m]
[m2][trn][b]kim aṅga,[/b] how much more?[/trn][/m]
[m1]áṅga [/m]
[m2][trn][i]m. pl. No of a people and their country.[/i][/trn][/m]
I've seen 3 instances of a headword, so I need to look for more than just one repetition of any given headword.
My attempts so far (such as "^(.+?\s)" based on this question) just at matching identical headwords are returning too much. I'm mostly using the regex find and replace function in Sublime Text, but would be happy to do this in any way possible. I know this is probably really simple and boring for regex gurus, so thanks for taking the time to help a newbie.