8

I wanna match Telegram username in message text and delete entire line, I've tried this pattern but the problem is that it matches emails too:

.*(@(?=.{5,64}(?:\s|$))(?![_])(?!.*[_]{2})[a-zA-Z0-9_]+(?<![_.])).*

Pattern should match all this lines :

Hi @username how are you?

Hi @username.how are you?

@username.

And should not match email like this:

Hi email to something@domain.com

Ali Raghebi
  • 205
  • 2
  • 6
  • It may have more than one emoji before @ – Ali Raghebi Aug 07 '20 at 20:00
  • I was thinking `.*[^a-zA-Z]@` ... which would be far from perfect. Then I looked up http://emailregex.com/ And thought... maybe that would be helpful? You could maybe get your match as you have it, then use another regex to check if the "username" is actually a username, or if it's an email. – Reed Aug 07 '20 at 20:00
  • Is it about emojis only? Or non word characters? Can the @ occur more than once in the string? – The fourth bird Aug 07 '20 at 20:02

3 Answers3

8

Use

.*\B@(?=\w{5,32}\b)[a-zA-Z0-9]+(?:_[a-zA-Z0-9]+)*.*

See proof

\B before @ means there must be a non-word character or start of string right before the @.

EXPLANATION

NODE                     EXPLANATION
--------------------------------------------------------------------------------
  .*                       any character except \n (0 or more times
                           (matching the most amount possible))
--------------------------------------------------------------------------------
  \B                       the boundary between two word chars (\w)
                           or two non-word chars (\W)
--------------------------------------------------------------------------------
  @                        '@'
--------------------------------------------------------------------------------
  (?=                      look ahead to see if there is:
--------------------------------------------------------------------------------
    \w{5,32}                 word characters (a-z, A-Z, 0-9, _)
                             (between 5 and 32 times (matching the
                             most amount possible))
--------------------------------------------------------------------------------
    \b                       the boundary between a word char (\w)
                             and something that is not a word char
--------------------------------------------------------------------------------
  )                        end of look-ahead
--------------------------------------------------------------------------------
  [a-zA-Z0-9]+             any character of: 'a' to 'z', 'A' to 'Z',
                           '0' to '9' (1 or more times (matching the
                           most amount possible))
--------------------------------------------------------------------------------
  (?:                      group, but do not capture (0 or more times
                           (matching the most amount possible)):
--------------------------------------------------------------------------------
    _                        '_'
--------------------------------------------------------------------------------
    [a-zA-Z0-9]+             any character of: 'a' to 'z', 'A' to
                             'Z', '0' to '9' (1 or more times
                             (matching the most amount possible))
--------------------------------------------------------------------------------
  )*                       end of grouping
--------------------------------------------------------------------------------
  .*                       any character except \n (0 or more times
                           (matching the most amount possible))
Ryszard Czech
  • 18,032
  • 4
  • 24
  • 37
1

.*[\W](@(?=.{5,64}(?:\s|$))(?![_])(?!.*[_]{2})[a-zA-Z0-9_]+(?<![_.])).*

I've added this [\W] non-word characters before @ symbol. Here you can check the result https://regex101.com/r/yFGegO/1

Bugaloo
  • 1,671
  • 3
  • 16
  • 21
1

Nothing new under the sun, but basically other patterns can be reduced to:

.*?\B@\w{5}.*

demo

or eventually:

.*?\B\w{5,64}\b.*

if you want to be more precise, but is it really needed?

Notice: if you want to remove the newline sequence too, add \R? at the end of the pattern.

Casimir et Hippolyte
  • 88,009
  • 5
  • 94
  • 125