So I'm trying to get an array of all the words used in my web page.
Should be easy, right?
The problem I run into is that $("body").text().split(" ")
returns an array where the words at the beginning of one element and end of another are joined as one.
i.e:
<div id="1">Hello
<div id="2">World</div>
</div>
returns ["HelloWorld"]
when I want it to return ["Hello", "World"]
.
I also tried:
wordArr = [];
function getText(target)
{
if($(this).children())
{
$(this).children(function(){getText(this)});
}
else
{
var testArr = $(this).text().split(" ");
for(var i =0; i < testArr.length; i++)
wordArr.push(testArr[i]);
}
}
getText("body");
but $(node).children()
is truthy for any node in the DOM that exists, so that didn't work.
I'm sure I'm missing something obvious, so I'd appreciate an extra set of eyes.
For what it's worth, I don't need unique words, just every word in the body of the document as an element in the array. I'm trying to use it to generate context and lexical co-occurrence with another set of words, so duplicates just up the contextual importance of a given word.
Thanks in advance for any ideas.
See Fiddle