2

I'm trying to clean some html text with javascript, there are white spaces included before and after some words (text is poorly formatted).

Currently I have this regex:

$("#" + target + " *").replaceText(/([\S][\u05B0-\u05C4]*)/gi, '<span class="marked">$1<\/span>');

This will capture all the non white-space characters and wrap them in a span element, but will not capture spaces between words (I need the span).

How would you solve this?

Shay Cojo
  • 73
  • 7
  • Is it intentional to use `\S` (all non white space)? Could you give example of input and desired output? – some Sep 09 '12 at 09:45
  • where does `$.fn.replaceText` come from? – Alexander Sep 09 '12 at 09:51
  • somewhat unclear. Can you provide example of desired output? – FilmJ Sep 09 '12 at 10:06
  • yeah, please example of a text input and output. – Stano Sep 09 '12 at 10:10
  • The \S was my attempt to capture all non white-space characters, it works but I'm losing the spaces in between words. Forgot to mention: replaceText is from http://benalman.com/projects/jquery-replacetext-plugin/ The input is a text nested within a table element (td), each nested element contains white-spaces (indentations), also the actual text is also poorly formatted (many spaces after some lines). When I use the function it wraps all those white-spaces inside spans (indent spaces, and the text's leading/trailing spaces). All I want is just the text and spaces in between words. – Shay Cojo Sep 09 '12 at 10:14
  • So, basically you want to replace multiple spaces with one space? Something like `string_with_text.replace(/\s{2,}/g,' ');` – some Sep 09 '12 at 10:29
  • First answer in [regex'ing html](http://stackoverflow.com/q/1732348/1081234) – Oleg Sep 09 '12 at 11:20

1 Answers1

1

This will match multiple repeated (spaces) and replace them with a single space:

'Quick   Brown      Fox'.replace(/[ ]+/g, ' '); //returns 'Quick Brown Fox'

This will match multiple repeated \n\r\t(whitespace symbols - spaces, tabs, new-lines and line-breaks) and replace them with a single space:

'Quick     Brown    Fox'.replace(/\s+/g, ' ');  //returns 'Quick Brown Fox'

Fiddled

I don't understand your explanation of what you're trying to achieve with span wraparounds, but you can do whatever you want with the output from above.

Oleg
  • 24,465
  • 8
  • 61
  • 91
  • Why do you have `/[ ]+/g` instead of `/ +/g`? – some Sep 09 '12 at 11:35
  • @some: If `\s` in the second example is too broad a match; OP would be able to add other character references to match into the first pattern (e.g. `/[ \t]+/g`) without breaking it (would have happened with `/ \t+/g`). It is implied that a *set* of characters is being replaced, just so happens that it consists of only one character. You're right though, specifically in the first example there is no need for enclosing a single character in set brackets. – Oleg Sep 09 '12 at 21:59