1

I have a large text files.I have a given specific keyword [with spaces in between like "ABC DEF G"]. It either occurs in the text file exactly two times or does not occur at all.

I want to have a notepad++ search such that all the text between these two keywords (around 300 to 1000 lines) gets selected automatically. I will then do some operation on this selected text via my plugin.So my first question is:

Is this autos-election possible(via regular expression or some already built plugin).

If yes ,can some one please suggest.

As of now I am reading entire text file to search which is time and memory consuming. Thanks.

somerandomguy
  • 323
  • 4
  • 13

1 Answers1

1

Try this Regex:

(?<=ABC DEF G)[\s\S]*(?=ABC DEF G)

Click for Demo

Explanation:

  • (?<=ABC DEF G) - Positive Lookbehind to find the position which is preceded by the text ABC DEF G
  • [\s\S]* - matches 0+ occurrences of any character
  • (?=ABC DEF G) - Positive lookahead to find the position immediately followed by the text ABC DEF G

Output:

enter image description here

Gurmanjot Singh
  • 10,224
  • 2
  • 19
  • 43
  • just one thing if you can help me. Text is like ABC DEF G *****.then some text . Then ABC DEF G ******.I want the text from next line after ***** and till i reach A again in the second. Further if the text is not present ,what will this expression do.First problem I can also work around by doing character scanning which I am already doing in my plugin.Can you tell about what happens if this text is not present. – somerandomguy Oct 07 '17 at 06:49
  • @NikhilSaxena Try this for that case: `(?<=ABC DEF G)[^\n]*\n\K[\s\S]*(?=ABC DEF G)`. See [HERE](https://regex101.com/r/8cOYW8/2). You are selecting from next line onwards. – Gurmanjot Singh Oct 07 '17 at 06:56
  • It is working correct.But in some cases ,it is not working.When it is not working,text files are large around 4 lac lines,In case of small files upto 2.5 lac ,it is working.Does this search may fail according to file size or it may be some other reason? – somerandomguy Oct 07 '17 at 07:05
  • Well it shouldn't. Since we are dealing with huge files, may be for some files, it is taking a bit longer. To reduce the number of steps, try using this one `ABC DEF G[^\n]*\n\K[\s\S]*(?=ABC DEF G)`. I have removed the Positive lookbehind part for a faster match. – Gurmanjot Singh Oct 07 '17 at 07:17
  • :Just one last requirement.It is also selecting one new line character extra.Can i skip selecting that new line character.It is affecting decoding of selected text. – somerandomguy Oct 09 '17 at 03:11