1

I was learning regular expression, It seems very much confusing to me for now.

val.replace(/^[^a-zA-Z0-9]*|[^a-zA-Z0-9]*$/g, '');

In the above expression

1) which part denotes not to include white space? as i am trying to exclude all non alphanumeric characters.

2) Since i don't want to use even '$' and ''(underscore) can i specify '$' & ''(underscore) in expression something like below?

val.replace(/^[^a-zA-Z0-9$_]*|[^a-zA-Z0-9$_]*/g, '');?

3) As 'x|y' specify that - "Find any of the alternatives specified". Then Why we have used something like this [^a-zA-Z0-9]|[^a-zA-Z0-9] which is same on both sides?

Please help me understand this, Finding it bit confused and difficult.

SsNewbie
  • 257
  • 2
  • 8
  • 21
  • Read http://stackoverflow.com/q/22937618/152786 for general info. – smathy Nov 11 '14 at 05:19
  • I found this code expression when i was going through Regular expression basics. i.e to replace all non alpha numeric characters. – SsNewbie Nov 11 '14 at 05:20

3 Answers3

4

This regular expression replaces all starting and trailing non alphanumeric characters from the string.

  1. It doesn't specifically specifies whitespace. It just negates every thing other than alphanumeric characters. Whatever inside square bracket is a character set - [Whatever]. A starting cap(^) INSIDE the character set says its a negation. So [^a-zA-Z0-9]* says zero or more characters which are other than a-z, A-z or 0-9.

  2. The $ sign at the end says, to the end of string and nothing to do with $ and _ symbols. That will be already included in the character set as it all non alpha numeric characters.

  3. Refer answer of @smathy.

Also just FYI, AFAIU regular expression can't be learned by scrolling a tutorial. You just need to go through the basics and try out the examples.

Jithin
  • 2,594
  • 1
  • 22
  • 42
4

Some basic info.

When you read regular expressions, you read them from left to right. That's how the engine does it.

This is important in the case of alternations as the one on the left side(s) are always tried first.

But in the case of a $ (EOL or EOS) anchor, it might be easier to read from right to left.

Built-in assertions like line break anchors ^$ and word boundry \b along with normal assertions look ahead (?=)(?!) and look behind (?<=)(?<!), do not consume characters.

They are like single path in-line conditionals that pass or fail, where only if it passes will the expression to the right of it be examined. So they do actually Match something, they match a condition.

Format your regex so you can see what its doing. (Use a app to help you RegexFormat 5)

   ^                # BOS
   [^a-zA-Z0-9]*    # Optional not any alphanum chars
|                 # or, 
   [^a-zA-Z0-9]*    # Optional not any alphanum chars
   $                # EOS

Your regex in global context will always match twice, once at the beginning of the string, once at the end because of the line break anchors and because you don't actually require anything else to match.

So basically you should avoid trying to match (mix) all optional things with the built-in anchors ^$\b. That means your regex is better represented by ^[^a-zA-Z0-9]+|[^a-zA-Z0-9]+$ since you don't care if its NOT there (in the case of *, zero or more quantifier).

Good Luck, keep studying.

nhahtdh
  • 55,989
  • 15
  • 126
  • 162
1

To answer your third question, the alternatives run all the way to the //s, so both sides are not the same. In the original regex the left alternative is "all non alphanumerics at the start of the string" and the right alternative is "all non alphanumerics at the end of the string".

smathy
  • 26,283
  • 5
  • 48
  • 68
  • 1
    I dint know that, was not aware on why we use same in both alternatives. That makes sense now. Ty on helping out – SsNewbie Nov 11 '14 at 06:55