Regular Expression to match #hashtag but not #hashtag; (with semicolon)

Question

I have the current regular expression:

/(?<=[\s>]|^)#(\w*[A-Za-z_]+\w*)/g

Which I'm testing against the string:

Here's a #hashtag and here is #not_a_tag; which should be different. Also testing: Mid#hash. #123 #!@£ and <p>#hash</p>

For my purposes there should only be two hashtags detected in this string. I'm wondering how to alter the expression such that it doesn't match hashtags that end with a ; in my example this is #not_a_tag;

Cheers.

tk78 · Accepted Answer · 2016-07-22T05:42:09.353

38

How about the following:

\B(\#[a-zA-Z]+\b)(?!;)

Regex Demo

\B -> Not a word boundary
(#[a-zA-Z]+\b) -> Capturing Group beginning with # followed by any number of a-z or A-Z with a word boundary at the end
(?!;) -> Not followed by ;

edited Jul 22 '16 at 05:42

answered Jul 21 '16 at 14:31

tk78

937
7
14

4

Did you mean `\B`? `\W` needs an actual character to be present before the `#`. – Tim Pietzcker Jul 21 '16 at 14:39
Accepted answer for least steps. \B is likely what I'll be using. – Wex Jul 21 '16 at 16:15
2

It does not match #007 nor #50cents which are real hashtags. – alemol Oct 15 '18 at 20:04
Does it support non-English languages? – Chitrang Jul 30 '21 at 16:15

score 11 · Answer 2 · answered Apr 01 '20 at 13:31

11

This is the best practice.

(#+[a-zA-Z0-9(_)]{1,})

answered Apr 01 '20 at 13:31

nhCoder

451
5
11

2

Best answer on here, thank you. Only modification that may be needed is to allow åççéñts if your software will be international. Maybe something like `(#+[a-zA-Z0-9A-Za-zÀ-ÖØ-öø-ʸ(_)]{1,})` – Albert Renshaw Feb 20 '21 at 00:44
Perfect, but ####tag is also valid. UPD: `^#[a-zA-Z-а-яА-ЯÀ-ÖØ-öø-ʸ0-9(_)]{1,}$` – vusaldev Apr 07 '23 at 10:03
Why does this answer include brackets `()` as a valid hashtag character? Also why does it allow multiple hashtags like ##hashtag? Also why is `{1,}` used, if a simple `+` would be sufficient? – NicoHood Jul 01 '23 at 10:29

score 8 · Answer 3 · answered Feb 06 '20 at 18:08

8

/(#(?:[^\x00-\x7F]|\w)+)/g

Starts with #, then at least one (+) ANCII symbols ([^\x00-\x7F], range excluding non-ANCII symbols) or word symbol (\w).

This one should cover cases including ANCII symbols like "#їжак".

answered Feb 06 '20 at 18:08

ne4istb

662
5
17

score 4 · Answer 4 · answered Jul 21 '16 at 14:15

4

You can use a negative lookahead reegex:

/(?<=[\s>]|^)#(\w*[A-Za-z_]+\w*)\b(?!;)/

\b - word boundary ensures that we are at end of word
(?!;) - asserts that we don't have semi-colon at next position

RegEx Demo

answered Jul 21 '16 at 14:15

anubhava

761,203
64
569
643

For performance `\B#(\d*[A-Za-z_]+\w*)\b(?!;)` should be your regex. `#[a-zA-Z]+` won't match `#123hashtag` – anubhava Jul 21 '16 at 16:17
Plus, turned out, 'look behind regex' is not supported for Safari. – Vano Mar 26 '21 at 09:39
Yes that's right, it was never meant to be Safari compatible though – anubhava Mar 26 '21 at 09:50

score 1 · Answer 5 · answered Jul 21 '16 at 14:43

Similar to anubhava's answer but swap the 2 instances of \w* with \d* as the only difference between \w and [A-Za-z_] is the 0-9 characters

This has the effect of reducing the number of steps from 588 to 90

(?<=[\s>])#(\d*[A-Za-z_]+\d*)\b(?!;)

Regex101 demo

SVG-Heart · Answer 6 · 2021-04-10T16:03:33.350

1

(?<=(\s|^))#[^\s\!\@\#\$\%\^\&\*\(\)]+(?=(\s|$))

A regex code that matches any hashtag.

In this approach any character is accepted in hashtags except main signs !@#$%^&*()

Usage Notes

Turn on "g" and "m" flags when using!

It is tested for Java and JavaScript languages via https://regex101.com and VSCode tools.

It is available on this repo.

edited Apr 10 '21 at 16:03

answered Apr 10 '21 at 15:15

SVG-Heart

161
1
5

Don't think your answer is answering OP questions: https://regex101.com/r/FFvPfn/1 OP doesn't want to match the semicolon. For the future it's better to share direct regex101 demo/snippet instead of just link to the landing page. – Anton Krug Apr 10 '21 at 15:54

score 0 · Answer 7 · answered Mar 07 '21 at 15:15

0

You could try this pattern : /#\S+/

It will include all characters after # except for spaces.

answered Mar 07 '21 at 15:15

Ajay Lingayat

1,465
1
9
25

Regular Expression to match #hashtag but not #hashtag; (with semicolon)

7 Answers7

Linked