2

I am looking for a regex expression that will exclude the below words from a huge text file/files.

@author
@Autowired
@Override
@param
@SuppressWarnings

I have tried with this but does not work as expected.

@[^(author)(Autowired)(Override)(param)(SuppressWarnings)].*
Ousmane D.
  • 54,915
  • 8
  • 91
  • 126
Viswa
  • 63
  • 1
  • 1
  • 4
  • Why is this tagged `javascript`? – shmosel Mar 06 '17 at 03:27
  • 1
    @Viswa shmosel has a good point... to answer the question correctly, we really need to know what language you're using. If you're writing a Node.js program to process Java sources, then you should have only the "javascript" tag. If you're writing a Java program, you should have the "java" tag. It makes a difference because there are regex features that are supported by some languages and not others. – ajb Mar 06 '17 at 04:32

4 Answers4

6

Try using the following regex (using negative look-ahead) :

@(?!author|Autowired|Override|param|SuppressWarnings).*

see regex demo / explanation

m87
  • 4,445
  • 3
  • 16
  • 31
3

Square brackets in regexes are used for character classes. When you put a list of characters in square brackets, this matches one character that is one of the ones listed. So

[author]

matches one character, if it's a, h, o, r, t, or u. It does not look for the word author. Putting ^ in front also looks for one character that isn't in the list:

[^author]

matches one character as long as it's not a, h, o, r, t, or u.

But the key thing here is that [] cannot be used to match words or other sequences. In your example,

@[^(author)(Autowired)(Override)(param)(SuppressWarnings)].*

the part in square brackets matches one character that is not (, a, u, or any of the other characters that appear in the square brackets (many of those characters appear multiple times, but that doesn't affect anything).

ajb
  • 31,309
  • 3
  • 58
  • 84
1

You can use a negative lookahead:

@(?!author|Autowired|Override|param|SuppressWarnings)\S+

Basically, it looks for a @ that is not followed by that list of words, and then it matches any non-whitespace characters after that.

Sverri M. Olsen
  • 13,055
  • 3
  • 36
  • 52
0

To flip the script, if you're actually trying to take the text file and remove things that are in your list of keywords, you'll probably want to find those using syntax more like this: @(author|AutoWired|Override|param|SuppressWarnings)\b. The terminal \b is just a precaution to avoid @authority or other unlikelihoods.