1

I'm terrible at regex and need to remove everything from a large portion of text except for a certain variable declaration that occurs numerous times, id like to remove everything except for instances of mc_gross=anyint.

looter
  • 69
  • 1
  • 6

3 Answers3

3

Generally we'd need to use "negative lookarounds" to find everything but a specified string. But these are fairly inefficient (although that's probably of little concern to you in this instance), and lookaround is not supported by all regex engines (not sure about notepad++, and even then probably depends on the version you're using).

If you're interested in learning about that approach, refer to How to negate specific word in regex?

But regardless, since you are using notepad++, I'd recommend selecting your target, then inverting the selection.

This will select each instance, allowing for optional white space either side of the '=' sign.

mc_gross\s*=\s*\d+

The following answer over on super user explains how to use bookmarks in notepad++ to achieve the "inverse selection":

https://superuser.com/questions/290247/how-to-delete-all-line-except-lines-containing-a-word-i-need

Substitute the regex they're using over there, with the one above.

Community
  • 1
  • 1
Sepster
  • 4,800
  • 20
  • 38
  • How would I go about doing that? I've tried using bookmarks but with no luck. – looter Apr 05 '13 at 06:01
  • @looter how would you invert the selection? I've updated the answer with a link that explains the process, using bookmarks. – Sepster Apr 05 '13 at 06:29
  • I had tried that but it works via lines, and the data I need is in data within lines as well. Here is an expression thats very close to what I need, but I am having trouble getting it to match numbers too (?!mc_gross=\d+)\b\w+ , it selects everything but mc_gross= , problem is it selects the number after mc_gross= too. – looter Apr 05 '13 at 06:43
  • @looter ok, what about searching _everything_, with your interesting bits captured, then replacing with the capture? On an iPad at the mo so can't test if notepad++ will do that like we want... But try search: .*?\b(mc_gross=\d+).* and then replace with \1 . Otherwise can look at this later tonight for you. In the mean time you're welcome to unaccept your answer so that someone can solve this properly for you :-) – Sepster Apr 05 '13 at 10:06
  • Thanks for all the help guys, got it sorted out! – looter Apr 06 '13 at 05:20
  • @looter isn't the accepted answer pretty much just as per my previous comment? Either way, glad you got there in the end! :-) – Sepster Apr 06 '13 at 10:42
2

You could do a regular expression replace of ^.*\b(mc_gross\s*=\s*\d+)\b.*$ with \1. That will remove everything other than the wanted text on each line. Note that on lines where the wanted text occurs two or more times, only one occurrence will be retained. In the search the ^.*\b matches from start-of-line to a word boundary before the wanted text; the \b.*$ matches everything from a word boundary after the wanted text until end of line; the round brackets capture the wanted text for the replacement text. If text such as abcmc_gross=13def should be matched and retained as mc_gross=13 then delete the \bs from the search.

To remove unwanted lines do a regular expression search for ^mc_gross\s*=\s*\d+$ from the Mark tab, tick Bookmark line and click Mark all. Then use Menu => Search => Bookmark => Remove unmarked lines.

AdrianHHH
  • 13,492
  • 16
  • 50
  • 87
1

Find what: [\s\S]*?(mc_gross=\d+|\Z)
Replace with: \1

Position the cursor at the start of the text then Replace All.

Add word boundaries \b around mc_gross=\d+ if you think it's necessary.

MikeM
  • 13,156
  • 2
  • 34
  • 47
  • Confused by `[\s\S]`. The `\s` means any space-like character. The `\S` means any non-space-like character. The combination should mean any character, the same as `.` (ie dot). – AdrianHHH Apr 05 '13 at 13:52
  • @AdrianHHH. Yes, but unlike the `.` it will also match newlines. (Although there is a checkbox in the editor to make `.` match newlines anyway.) – MikeM Apr 05 '13 at 13:53