2

I wrote a piece of JavaScript code and want to implement two functions:

1: Break a piece of text into separate word arrays. So far, I have used regex to search for spaces and punctuation. It does part of the functionality, but can't do anything about the whitespace code  .

2: Wrap each word in the HTML with a span tag. (I don't know how should I implement this)

this is the code:

<!DOCTYPE html>
<html>
<head>
    <script>
    window.onload = function() {
        var text = document.getElementById('text').textContent
            // Regex cannot search for `&nbsp;`
        var word_array = text.split(/[ \t\n\r.?,"';:!()[\]{}<>\/]/)
        console.log(text)
        console.log(word_array)
    }
    </script>
</head>

<body> 
    the other text
    <div id="text">
        this is
        text,break&nbsp;up&nbsp;&nbsp;the;words!
        istest testis,
        text <a href="#">text build</a> html tag! 
    </div>
    the other text
</body>

</html>

However, my code does not separate the three words. For example, break&nbsp;up&nbsp;&nbsp;the, should become to [break,up,the].

Also, I didn't wrap all the words in the div with span tags, like this:

<div id="text">
<span id='word_1'>this</span> <span id='word_2'>is</span>
...
<span id='word_3'>text</span> <a href="#"><span id='word_4'>text</span> <span id='word_5'>build</span></a> <span id='word_6'>html</span> <span id='word_7'>tag</span>!
</div>
foobar
  • 571
  • 1
  • 5
  • 20
dong
  • 51
  • 6

1 Answers1

1

\s will do the job. You can change:

var word_array = text.split(/[ \t\n\r.?,"';:!()[\]{}<>\/]/)
                              ^^

to:

var word_array = text.split(/[\s\t\n\r.?,"';:!()[\]{}<>\/]/)
                              ^^

By the way, \s is a shorthand for [ \t\r\n\f]. So you can simplify your expression to:

var word_array = text.split(/[\s.?,"';:!()[\]{}<>\/]/)

Then you may need to remove empty elements from array:

//remove '' from word_array
var word_array2 = word_array.filter(e => e != '')

For the question 2, following code will wrap the text words with span tag: Edited based on the comment of @dong

function add_span(word_array, element_) {
    for (let i = 0; i < word_array.length; i++) {
        var reg = new RegExp("([\s.?,\"';:!(){}<>])(" + word_array[i] + ")([\s.?,\"';:!])", 'g');
        element_ = element_.replace(reg, '$1<span>$2</span>$3');
    }
    return element_
}
foobar
  • 571
  • 1
  • 5
  • 20
  • Can I add ` ` to my original regular expression instead of replacing it with spaces? – dong Jul 04 '22 at 18:33
  • yes, you can with `\s` metacharacter which renders whitespace. Please check the updated answer – foobar Jul 04 '22 at 20:53
  • 1
    awesome! Regular expression problem solved. For the second problem of wrapping all words in span tags, your code removes the html tags from the original code (eg text build). I solved it according to your idea. ``` function add_span(word_array, element_){ for (let i=0; i])(" + word_array[i] + ")([\s.?,\"';:!])", 'g'); element_ = element_.replace(reg,'$1$2$3'); } return element_ } ``` – dong Jul 04 '22 at 21:23