-1

Using EmEditor, I want to delete all the repeated instances of a string that occupies the full line plus the line above it. For example, in this text the repeated string is Cyperus esculentus (it could be anything else) and I want all its repeated instances deleted, including the previous line, i.e. language code. So far, what I figured out is something like this:

.{2,3} \nCyperus esculentus\n

But the problem is that I have to replace the repeated string with the one that is repeated in each different text.

ar 
سعد لذيذ
ast 
Cyperus esculentus
azb 
یئمه‌لی توپالاق
az 
Yeməli topalaq
bo 
ཆུ་འབྲུམ།
ca 
Xufa
ceb 
Cyperus esculentus
cs 
Šáchor jedlý
de 
Erdmandel
en 
Cyperus esculentus
eo 
Cyperus esculentus
es 
Cyperus esculentus
eu 
Bedaur
fa 
اویار سلام زرد
fr 
Souchet comestible
gl 
Xunca doce
ha 
Aya
he 
גומא נאכל
id 
Cyperus esculentus
it 
Cyperus esculentus
ja 
ショクヨウガヤツリ
la 
Cyperus esculentus
nl 
Knolcyperus
nv 
Tłʼohigaaí
pl 
Cibora jadalna
pt 
Cyperus esculentus
ru 
Чуфа
srn 
Affo
sv 
Jordmandel
th 
แห้วไทย
tr 
Yer bademi
uk 
Смикавець їстівний
uz 
Yerbodom
vi 
Củ gấu tàu
war 
Cyperus esculentus
zh 
油莎草

The expected result is what is left after applying the regex I mentioned above (to clarify, in these texts there is only one string that can is repeated, so the regex does not have to look for multiple different repeated strings):

ar 
سعد لذيذ
azb 
یئمه‌لی توپالاق
az 
Yeməli topalaq
bo 
ཆུ་འབྲུམ།
ca 
Xufa
cs 
Šáchor jedlý
de 
Erdmandel
eu 
Bedaur
fa 
اویار سلام زرد
fr 
Souchet comestible
gl 
Xunca doce
ha 
Aya
he 
גומא נאכל
ja 
ショクヨウガヤツリ
nl 
Knolcyperus
nv 
Tłʼohigaaí
pl 
Cibora jadalna
ru 
Чуфа
srn 
Affo
sv 
Jordmandel
th 
แห้วไทย
tr 
Yer bademi
uk 
Смикавець їстівний
uz 
Yerbodom
vi 
Củ gấu tàu
zh 
油莎草

This is what worked for me

document.selection.StartOfDocument(false);
document.DeleteDuplicates("",eeIncludeAll);
document.selection.Replace("([a-z]{2,3} \\n)([a-z]{2,3} \\n)","\\2",eeFindReplaceCase | eeReplaceAll | eeFindReplaceRegExp,0);
document.selection.Replace("([a-z]{2,3} \\n)([a-z]{2,3} \\n)","\\2",eeFindReplaceCase | eeReplaceAll | eeFindReplaceRegExp,0);
document.selection.Replace("([a-z]{2,3} \\n)([a-z]{2,3} \\n)","\\2",eeFindReplaceCase | eeReplaceAll | eeFindReplaceRegExp,0);
greektranslator
  • 499
  • 1
  • 6
  • 19

1 Answers1

1
  1. In the Filter toolbar, select 1 from the Number of Additional Visible Lines Above Matched Lines, enter Cyperus esculentus, and press the Enter key.

  2. Make sure the Block Multiple Changes button is clear (NOT set) in the same toolbar.

  3. Select Select All and Delete on the Edit menu (or press Ctrl + A, Delete when the keyboard forcus is in the editor).

  4. Click the Abort button in the Filter toolbar. EmEditor - Filter toolbar

If you would like to use a macro, here is the macro for you:

fs = document.filters;
fs.Clear();
fs.AddFind( "Cyperus esculentus", eeFindReplaceCase, 0 );
fs.VisibleLinesAbove  = 1;
fs.VisibleLinesBelow  = 0;
document.filters = fs;
document.selection.SelectAll();
document.selection.Delete();
fs.Clear();
document.filters = fs;

You can run this macro after you open your data file. To do this, save this code as, for instance, Filter.jsee, and then select this file from Select... in the Macros menu. Finally, open your data file, and select Run in the Macros menu while your data file is active. Make sure the Block Multiple Changes button is clear before you run the macro.

References: EmEditor Macro Reference: Filters Collection

Updates

I understand that "Cyperus esculentus" could be any other phrase. Assuming the duplicates always appear at even line numbers, here is the macro you can use instead. This macro selects all even numbers, bookmark duplicates in the selected lines, and delete all bookmarked lines (+one line above). Make sure the Block Multiple Changes button is clear before you run the macro.

editor.ExecuteCommandByID(4323);  // clear all bookmarks
document.selection.StartOfDocument(false);
editor.ExecuteCommandByID(4208);  // No Wrap
nLines = document.GetLines();
document.selection.LineDown(false,1);
for( i = 0; i < nLines; i += 2 ) {
    editor.ExecuteCommandByID(4153);  // select character
    document.selection.CharRight(false,1);
    editor.ExecuteCommandByID(4153);
    document.selection.StartOfLine(false,eeLineView | eeLineHomeText);
    document.selection.LineDown(false,2);
}

document.DeleteDuplicates("",eeSortSelectionOnly | eeBookmark | eeIncludeAll);  // bookmark all duplicates in selected lines
document.selection.Collapse();

// filter bookmarked lines only
fs = document.filters;
fs.Clear();
fs.AddFind( "", 0, eeExFindBookmarkedOnly );
fs.VisibleLinesAbove  = 1;
fs.VisibleLinesBelow  = 0;
document.filters = fs;

document.selection.SelectAll();
document.selection.Delete(1);    // delete all filtered lines
fs.Clear();
document.filters = fs;
Yutaka
  • 1,761
  • 2
  • 5
  • 9
  • Hi thanks, the point is not to delete that specific string (Cyperus esculentus), I can already do that, with the regex posted originally `.{2,3} \nCyperus esculentus\n` but with any repeated string in such format texts. For example it could be any other text there instead of Cyperus esculentus. – greektranslator Jan 26 '21 at 21:09
  • I have a hard time understanding your question. Please write a minimal, reproducible example. https://stackoverflow.com/help/minimal-reproducible-example – Yutaka Jan 26 '21 at 22:36
  • It is simple, instead of Cyperus esculentus it could be any other phrase there, which is repeated. The point is for a regex to check for existing duplicates and delete as described (with lookahead?). For example, removing duplicate lines for duplicates which are not adjacent: `^(.*)$((?:\r?\n.*)*?)^\1$\r?\n?` would do half the job as it leaves the line above https://stackoverflow.com/questions/1573361/how-do-i-find-and-remove-duplicate-lines-from-a-file-using-regular-expressions – greektranslator Jan 27 '21 at 13:25
  • You should include more conditions where duplicates appear. Do the duplicates always appear at even line numbers? Are the length of duplicates always more than 3 characters long? – Yutaka Jan 27 '21 at 15:26
  • I've updated my answer assuming the duplicates always appear at even line numbers – Yutaka Jan 27 '21 at 16:57
  • Hi, thanks! Meanwhile I found another solution (I updated my original post). One issue I had is that the macro recorder did not record the "Include all lines of each duplicate option" that was selected and I had to experiment in order to find out the right syntax as there are no real examples of flag usage in this page: http://www.emeditor.org/en/macro_document_delete_duplicates.html – greektranslator Jan 28 '21 at 08:46
  • `eeIncludeAll` is the "Include all lines of each duplicate" option. It should be recorded by a macro. – Yutaka Jan 28 '21 at 15:46
  • No, it does not get recorder in my version: 17.8.1 – greektranslator Jan 28 '21 at 16:57