0

I have been trying to extract the full-text content from the HTML document for computation and I was able to find the solution for that in jquery but it's quite partial... The output is as expected for the following code:

$(document).ready(function(){ 
    console.log($("*").text())
})

This is the output I was talking about. I want to store the content in the console in a variable. When I tried doing something like

var words = []
$(document).ready(function(){ 
    words.push($("*").text())
})
console.log(words)

it returns undefined. I came to know that it is because of the async of the callback. How do I approach this issue. Thanks in advance.

freedomn-m
  • 27,664
  • 8
  • 35
  • 57
  • `document.ready` is not "async" - it's an event that will fire... well.. when the doc is ready (the effect is the same - it runs later - the cause is different). So you've done this: 1) var words, 2) setup an event to fire later, 3) log words, 4) event fires and adds the text to the variable. To fix, put the console.log inside the doc ready (and the var if you like) – freedomn-m Feb 12 '20 at 16:33
  • `var words = [];console.log(words)` won't give you "undefined" - if you're seeing "undefined" in the console, you're seeing the return value from `console.log()` - the line above is the output. – freedomn-m Feb 12 '20 at 16:34
  • Do you need the entire structure of the HTML document? Or you're looking for all the text inside the webpage? – Juan Marco Feb 12 '20 at 16:53
  • @JuanMarco I'm looking for all the text inside the webpage. – K.R. Vijayalakshmy Feb 12 '20 at 16:58
  • @freedomn-m I did try including the log function inside the callback and it works fine. Will I be able to use the variable in an async function? – K.R. Vijayalakshmy Feb 12 '20 at 17:00
  • Can you share demo ? – vadivel a Feb 12 '20 at 17:05
  • I'm sorry I can't quite get you @vadivela? Demo in the sense do you want me to show how I want the output to be? – K.R. Vijayalakshmy Feb 12 '20 at 17:06
  • You need to embrace asynchronous / event-driven functionality. The answer to your question "will I be able to use the variable in an async function" depends on what you need it for / how you want to use. Most people ask if they can use an async value *outside* the async function (rather than inside) and that's a no, because it's not set yet. Your question is about events rather than async, but they're essentially the same, you can read this: https://stackoverflow.com/questions/14220321/how-do-i-return-the-response-from-an-asynchronous-call – freedomn-m Feb 12 '20 at 17:07
  • @freedomn-m Thank you – K.R. Vijayalakshmy Feb 12 '20 at 17:08

2 Answers2

0

Consider your Selector, I think your scope is grabbing too many elements. Look at the following.

$(function() {
  var words = [];
  $("body").children().not("script").each(function(i, el) {
    words.push($(el).text().trim());
  });
  console.log(words);
});
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>
<h1>Introduction</h1>
<p>This is a paragraph. </p>
<div class="footer">02.12.2020</div>

This will iterate all the child elements of the Body tag. It will read the Text of each element and enter it into it's own entry in the array. Your result would be like:

[
  "Introduction",
  "This is a paragraph.",
  "02.12.2020"
]
Twisty
  • 30,304
  • 2
  • 26
  • 45
  • Thank you for the solution. It's simple and easily understandable. But can you tell what ```$(function()...``` means? Since both the answers are the same I'll go with one of them. Thanks again! – K.R. Vijayalakshmy Feb 13 '20 at 16:28
  • Will I be able to access the words list beyond the callback by parametrizing it? – K.R. Vijayalakshmy Feb 13 '20 at 16:40
  • @K.R.Vijayalakshmy please read: https://stackoverflow.com/questions/8396407/jquery-what-are-differences-between-document-ready-and-window-load – Twisty Feb 13 '20 at 16:43
  • @K.R.Vijayalakshmy as the variable `words` was defined outside the `each` loop callback and then is populated during the callback, yes the content will be accessible after the loop completes. – Twisty Feb 13 '20 at 16:45
  • Yes, I am able to do so. Thank you. I just wanted to clarify. – K.R. Vijayalakshmy Feb 13 '20 at 16:45
0

One way to do this would be to get all the elements in the body, iterate over them to get their text content. With jQuery it would look something like this:

$(document).ready(function() {
  
  let content = []
  
  $('body * :not(script)').each((i, el) => {
    content.push($(el).text())
  })
  
  console.log(content)
})
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>

<aside>
  <h1>JS Documentation</h1>
  <ul>
    <li>Introduction</li>
    <li>What you should already know</li>
  </ul>
</aside>
<main>
  <h2>Introduction</h2>
  <p>JavaScript is a cross platform...</p>
</main>

Note: the :not(script) selector will leave out any <script> tags (if present) in the <body> of the document.

Tip: If you need to get rid of line breaks whitespace you can use something like this:

text().trim().replace(/\r?\n|\r/g, '')
Juan Marco
  • 3,081
  • 2
  • 28
  • 32