3

The question:

Forget everything below for a second, since my detail seems to be confusing people (or else this is really complicated).

I want to match, with regex, "everything except what this (any) capture group matches".

What I've tried:

I saw this question, but the answers and question all talk about one situation without actually explaining how / why the syntax works, so I cant figure it out.

I looked at "negative-look-ahead" with ?!, but don't really understand how that achieves what I'm trying to do.

I'm trying to match everything except a capture group, for example ("[a-z]*",).

For example, in this multi-line list:

"buckeye",
"buckeye"
,
."
,"
"fbfdb
"feve,

How do I select everything except the capture group (which in my case should match "buckeye", or any set of " + any num a-z chars + ",) with Regex?


The reason need this is because I have a file with lots of entries such as:

"aidman",
"aidmen",
"aids",
"aiglet",
"aiglets",
"aigret",
"aigrets",
"aigrette",
"aigrettes",
"aiguille",
"aiguilles",
"aikido",

and I ran some replacements with my text editor on it to change the format, but a bunch of random things happened to ~20 of the 100,000 lines (a bug). So I need to find the improperly formatted lines.

Clarification:

My goal for this question is simply to understand how to say "I want to match everything except this capture group".

Community
  • 1
  • 1
  • 1
    I don't understand what "everything except capture group" means. Give us several examples of what you want to match, and what you don't want to match. – Dialecticus Nov 01 '14 at 02:07
  • @Dialecticus I clarified above, sorry. But really, I'm trying to learn how to do it with *any* capture group. –  Nov 01 '14 at 02:12
  • What would be the expected output? – Avinash Raj Nov 01 '14 at 02:13
  • @AvinashRaj I want to match anything that *doesn't* match that capture group. –  Nov 01 '14 at 02:14
  • so you want to match all the characters except `"buckeye",`.. – Avinash Raj Nov 01 '14 at 02:15
  • @AvinashRaj well now to think of it, I want to select only one line at once, but I know how to do that already I think. But yes *any* one line but the one defined there. You can just teach me how to select any other character though, I know how to modify that. My goal for this question is simply to understand how to say "I want to match everything except this capture group". –  Nov 01 '14 at 02:15
  • @AvinashRaj see the first line of the first multi-line list in my question? Everything *except* that first line should match the regex –  Nov 01 '14 at 02:19
  • Basically you need to replace the captured part with an empty string, and print all. – JorgeeFG Nov 01 '14 at 02:19
  • @Jorge yes that works but I need to insert this regex into a Notepad++ search, so that'a a no-go. –  Nov 01 '14 at 02:20
  • possible duplicate of [Regular expression to match string not containing a word?](http://stackoverflow.com/questions/406230/regular-expression-to-match-string-not-containing-a-word) –  Nov 10 '14 at 07:24
  • I was the user who posted this question, I just deleted my old account for a clean track record. Just to be helpful, this is an exact duplicate of this hugely popular question: http://stackoverflow.com/questions/406230/regular-expression-to-match-string-not-containing-a-word/26272200#26272200 –  Nov 10 '14 at 07:26

2 Answers2

3

You could use the PCRE verb (*SKIP)(*F),

"[^"]*",(*SKIP)(*F)|.+

DEMO

The above regex would skip all the "...", strings and matches the remaining lines.

OR

Through negative lookahead assertion,

^(?!.*"[^"]*",).*$

DEMO

(?!.*"[^"]*",) negative lookahead asserts that there isn't a string like "...", in that particular line. If yes, then the corresponding line would be matched. Look-rounds in regex are used for condition checking purposes. It won't consume any characters but it asserts whether a match would happen or not.

^                        the beginning of the string
(?!                      look ahead to see if there is not:
  .*                       any character except \n (0 or more
                           times)
  "                        '"'
  [^"]*                    any character except: '"' (0 or more
                           times)
  ",                       '",'
)                        end of look-ahead
.*                       any character except \n (0 or more times)
$                        before an optional \n, and the end of the
                         string
Community
  • 1
  • 1
Avinash Raj
  • 172,303
  • 28
  • 230
  • 274
  • 1
    Oh this works as it is! Thank you. I wish you could explain it so I'll know how to do it in the future. –  Nov 01 '14 at 02:32
  • @jt0dd which one? first or second.. I think i already explained the second one. Tell me which part you didn't understand. – Avinash Raj Nov 01 '14 at 02:33
  • Well they both look more complex than what I've used so far (sorry, still learning), could you explain how *both work? –  Nov 01 '14 at 02:34
  • Why having `.*` at the beginning of the negative lookahead part? This pattern will skip `b"uckeye",` which is pretty much an incorrect line in my view. – Dialecticus Nov 01 '14 at 02:44
  • 1
    then remove the `.*` from the lookahead `^(?!"[^"]*",).*$` . Who knows what exactly the op wants.... – Avinash Raj Nov 01 '14 at 02:47
0

So you want to find errors in the file, where a correct line is in the form "[a-z]*",. While I can't say how to do that in regex I can say how I would achieve this goal. I would use Notepad++ in several steps:

  1. Ctrl+F, change tab from Find to Mark, check the option "Bookmark line", and search for the pattern "[a-z]*",.
  2. When all correct lines become bookmarked invert the bookmarks (menu Search > Bookmark > Inverse Bookmark)
  3. Copy all bookmarked lines (menu Search > Bookmark > Copy Bookmarked Lines), and paste them to another empty file (Ctrl+V)
Dialecticus
  • 16,400
  • 7
  • 43
  • 103
  • thanks this is indirectly awesome since it saves my *ss right now. I'm so surprised that no one seems to know how to do "everything but this capture group" in regex! –  Nov 01 '14 at 02:26