18

I need a regex to match whole words that begin with $. What is the expression, and how can it be tested?

Example:

This $word and $this should be extracted.

In the above sentence, $word and $this would be found.

edgerunner
  • 14,873
  • 2
  • 57
  • 69
node ninja
  • 31,796
  • 59
  • 166
  • 254

6 Answers6

23

If you want to match only whole words, you need the word character selector

\B\$\w+

This will match a $ followed by one or more letters, numbers or underscore. Try it out on Rubular

edgerunner
  • 14,873
  • 2
  • 57
  • 69
  • 2
    Isn’t “isn’t” a word? How about “o’clock”? Is “side-effect” a word? Yes, yes, and yes. Is “42” a word? What about “_____”? No and no. – tchrist May 04 '11 at 00:30
  • What I used was the regex definition of a word, which not surprisingly has a shortcut. That fits in with the examples in z-buffer's question. If he would clarify his definition of "whole word" as you asked, I can modify my answer to fit. – edgerunner May 04 '11 at 10:34
  • I agree that yours is the most natural solution in a programming context. It just doesn’t meet the definition of "word" most natural-language users would use. There are various refinements one can make on it, if one knows the target programming language. For example, Java adds `\p{Currency_Symbol}` to its allowed characters; some use `\p{ID_Start}\p{ID_Continue}*`; several languages admit `::` as a separator between classname and ident, etc. But without knowing what he really meant, it’s hard to do better I admit. The solution that questioned what to do about `$$foo` was also interesting. – tchrist May 04 '11 at 12:32
  • This would also match the part beginning with the dollar sign in "aw$hucks". I think the question is how would you match words that only begin with the dollar sign. – user1383418 Apr 27 '16 at 19:12
  • Good point. You could add a `\B` word boundary selector at the beginning and it would work as you say. – edgerunner Jul 21 '17 at 17:38
  • maybe something is changed in regex, i found (here: http://regexr.com/) that \B is NOT start of the word, but \b should be used to match start of the word so resulting in: \b\$\w+ – pera Sep 12 '17 at 12:54
  • @pera, `$` is not a "word character" in regex, so its left boundary in this case needs to be a non-word to rule out `aw$hucks`. Since `w` is a word char, the boundary it shares with `$` is a word boundary. We need the opposite here. – edgerunner Aug 25 '20 at 21:55
18
\$(\w+) 

Explanation :

\$ : escape the special $ character
() : capture matches in here (in most engines at least)
\w : match a - z, A - Z and 0 - 9 (and_)
+ : match it any number of times

NoobEditor
  • 15,563
  • 19
  • 81
  • 112
Paystey
  • 3,287
  • 2
  • 18
  • 32
5

I think you want something like this:

/(^\$|(?<=\s)\$\w+)/

The first parentheses just captures your result.

^\$ matches the beginning of your entire string followed by a dollar sign;

| gives you a choice OR;

(?<=\s)\$ is a positive look behind that checks if there's a dollar sign \$ with a space \s behind it.

Finally, (to recap) if we have a string that begins with a $ or a $ is preceded by a space, then the regex checks to see if one or more word characters follow - \w+.

This would match:

$test one two three

and

one two three $test one two three

but not

one two three$test
user1383418
  • 664
  • 3
  • 8
  • 20
1

For the testing part of you question I can recommend you using http://myregexp.com/

Erikw
  • 799
  • 6
  • 18
  • Welcome to Stackoverflow, Erikw. Please if what you want to say is not a direct answer for the question, post a comment on the question or one of the answers instead. – edgerunner May 04 '11 at 00:18
  • Was just thinking about this since I just noticed the comments. So an answer addressing only one part of a question is more suited as a comment? – Erikw May 04 '11 at 00:23
  • My bad, I just noticed the "how can it be tested" part in the question. :) No, this is OK as it is. – edgerunner May 04 '11 at 10:38
0

This should be self-explanatory:

\$\p{Alphabetic}[\p{Alphabetic}\p{Dash}\p{Quotation_Mark}]*(?<!\p{Dash})

Notice it doesn’t try to match digits or underscores are other silly things that words don’t have in them.

tchrist
  • 78,834
  • 30
  • 123
  • 180
0

Try this as your regex:

/ (\$\w+)/

\w+ means "one or more word characters". This matches anything beginning with $ followed by word characters, but only if it's preceded by a space.

You could test it with perl (the \ after the echo is just to break the long line):

> echo 'abc $def $ ghi $$jkl mnop' \
  | perl -ne 'while (/ (\$\w+)/g) {print "$1\n";} ' 
$def

If you don't use the space in the regex, you'll match the $jkl in $$jkl - not sure if that is what you want:

> echo 'abc $def $ ghi $$jkl mnop' \
  | perl -ne 'while (/(\$\w+)/g) {print "$1\n";} ' 
$def
$jkl
Gert
  • 3,839
  • 19
  • 22