0

I have a function that through regular expression removes html content:

a.replace( /<.*?>/g, "");

However, if there are spaces they remain, for example:

<a href='site.com'>    testing</a>

That will keep the spaces. Also for something like this:

<a href='site.com'>    $20</a>

I would like the function to return only 20. So, the question is:

How do I modify the regular expression so that $ and spaces get removed as well?

luqita
  • 4,008
  • 13
  • 60
  • 93
  • 1
    http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags –  Aug 11 '11 at 17:30
  • great thread, very popular, but it doesn't answer my question :p – luqita Aug 11 '11 at 17:47
  • 2
    @luquita: He's got a point though, you really should be using DOM methods for this kind of thing. – FK82 Aug 11 '11 at 18:16
  • Simply use `a.innerText = "";` or `$(a).text("");` Regex is not the tool you're looking for. – rxgx Aug 11 '11 at 18:39
  • Using DOM is a good point here. For example `jQuery(" $20").text()` returns " $20" (StackOverflow strips the spaces) which is easier to process. Continuing `jQuery(…).text().replace(/[\s$]*/, '')` results in `20`. – Augustus Kling Aug 11 '11 at 18:53

3 Answers3

3

You could extend your expression and use:

a.replace( /(?:\s|\$)*<.*?>(?:\s|\$)*/g, "");

Now, (?:\s|\$) was added. This forms a pattern of whitespaces (\s) or the $ sign (\$). The escape before the $ sign is necessary since it would match line ends otherwise. Putting ?: directly after the parenthesis creates a group for searching that is not returned as a group result.

The pattern occurs twice to allow removal of whitespace or $ signs before or after the tag.

Augustus Kling
  • 3,303
  • 1
  • 22
  • 25
  • 1
    And the ?: is there to make it more l33t. :-) – Gerben Aug 11 '11 at 18:18
  • As mentioned the `?:` is there so that the whitespace-$ compound does not count as a search group. It does not matter for replacing, but in case somebody wants capture some groups, it prevents surprises. I was just thinking the asker could possibly want to reuse the pattern and extent it to search for elements of the string. Then the asker would have funny additional captured groups if the `?:` is missing. – Augustus Kling Aug 11 '11 at 18:48
  • You can just replace the `(?:\s|\$)` with `[\s$]` (`[\s\$]` in some regex flavors). – Justin Morgan - On strike Aug 11 '11 at 19:10
0

alternatively

a.replace( /<.*?[>][\s$]*/g, "");
Joseph Marikle
  • 76,418
  • 17
  • 112
  • 129
0

or to also remove the whitespace and dollar if there is no html tag present.

a.replace( /(<.*?>)|([\s$])/g, "");
Gerben
  • 16,747
  • 6
  • 37
  • 56