2

I am trying to find a regex solution to keep &#xA; and remove other things without breaking the line order. Only some of the lines contains this pattern one or more times. I tried (?<=&#xA;)(.+)|(.+)(?=&#xA;)|^((?!&#xA;).)*$, but it only keeps one from each row, although they contain more. For example, I have something like that:

The client requires photos of a radioactive world&#xA;Reach the target planet.
The client requires photos.&#xA;&#xA;Reach the target planet.
The client requires photos of a desert world&#xA;Reach the target planet.
The client requires photos of an airless world. Reach the target planet.
The client requires photos of a strange world&#xA;&#xA;Reach the target planet&#xA;Make a quick scan.

Expecting exactly this:

&#xA;
&#xA;&#xA;
&#xA;

&#xA;&#xA;&#xA;

I would be glad if you help.

Erdem Sarp
  • 31
  • 4

3 Answers3

1

You can use the following RegEx to match everything except &#xA

[^&#xA;\n]+

Demo

Ankit
  • 682
  • 1
  • 6
  • 14
  • It seems works on my samples. But I tried on 50k lines and it also keeps unnecessery characters. It makes this sentences: "The disturbed region must be accessed by <SPECIAL>Portal<> Speak to the inhabitants of the Space Anomaly to learn more" to "&;A&;&;&; A" – Erdem Sarp Oct 10 '20 at 04:39
1

You could use a capturing group.

(.*?)((?:&#xA;){0,})

Details:

  • (.*?): Group1 - matches any characters as few as possible
  • ((?:&#xA;){0,}): Group2 - matches &#xA; or not

Demo

Thân LƯƠNG Đình
  • 3,082
  • 2
  • 11
  • 21
  • Strangally it doesn't work on notepad. It says n lines changed but I can't see any difference. – Erdem Sarp Oct 10 '20 at 04:54
  • 1
    Ok, I just tried on Sublime Text and it worked exactly as I want. Don't know why notepad++ gives such a result. Anyway this will do. Thank you for help. – Erdem Sarp Oct 10 '20 at 04:58
1

You can make use of SKIP FAIL to match &#xA; and then not consume the match.

Then match all characters except &, and when it does encounter &, assert that it is not directly followed by #xA;

Find what

&#xA;(*SKIP)(*FAIL)|[^&\r\n]+(?:&(?!#xA;)[^&\r\n]*)*

Replace with:

Leave empty

Explanation

  • #xA; Match literally
  • (*SKIP)(*FAIL)| Consume the characters that you want to avoid
  • [^&\r\n]+ Match 1+ times any char except & or a newline
  • (?: Non capture group
    • &(?!#xA;) Match & if not directly followed by #xA;
    • [^&\r\n]* Match 0+ times any char except & or a newline
  • )* Close the non capture group and repeat 0+ times

Regex demo

enter image description here

The fourth bird
  • 154,723
  • 16
  • 55
  • 70