DOM selection in iframes not properly returning text content

Question

QUICK NOTE: I would strongly prefer any solutions presented to be implemented via pure JavaScript and/or HTML changes. I don't have anything against JQuery or any other library, framework, third-party tool, etc. personally, but I'm more interested in learning and improving rather than applying the quickest fix without an understanding of what's going on.

A quick description of what I'm trying to accomplish here is that I'm working with a page displaying information about chat sessions on a website. Information is recorded in a table format with some basic information (name, location of user, date, etc.) and in one of the columns for each entry is a link to another page where the chat transcript can be viewed. I've been asked to create a button that, when clicked, goes through all the chat records on that particular page, collects all of this transcript data, and exports the result to a .csv file. I tried this a few different ways and the only one that's worked correctly thus far is to loop through the table via the class name attached to the link I've described above, open an invisible iframe, and get the text data from the iframe. It doesn't seem like the most efficient solution so even though this isn't my question, if anyone has a different way of doing this I'm certainly open to ideas.

The function I'm currently using looks like this:

        async function getFileContents() {
            var viewLinks = document.getElementsByClassName('view-link');
            var output = "";
            for(var i = 0; i < viewLinks.length; i++) {
                await new Promise(function(resolve, reject) {
                    var dataWindow = document.createElement("iframe");
                    dataWindow.setAttribute("src", viewLinks[i].href);
                    dataWindow.setAttribute("base", "target = _parent");
                    dataWindow.style.display = "none";
                    document.body.appendChild(dataWindow);
                    
                    dataWindow.onload = function() {
                        var iframe = dataWindow.contentDocument;
                        var transcriptTextHeader = iframe.querySelector(".transcript-text").textContent;
                        var transcriptText = iframe.querySelector('#transcript').textContent;
                        var formattedText = `${transcriptTextHeader} ${transcriptText} \n`;
                        output += formattedText;
                        resolve(output);
                        document.body.removeChild(dataWindow);
                    }
                });
            }
            download(output, "testoutput.csv");
            return output;
        }

Everything works correctly except for one problem that I've yet been able to figure out: the header text (with querySelector("transcript-text)) works fine, but for some reason the transcript text itself, retrieved via the following line, is never pulled. Headers look fine in the output file but there's no text underneath any of them. I've tried everything I can think of but nothing seems to access the text at all. For reference, here is a short skeleton of the section of the HTML structure on the transcript viewing page I'm opening in the iframes.

   <div class = "transcript-data">
   <!-- content here -->
     <div class = "transcript-text">
     <!-- header content is retrieved from here -->
       <pre id = "transcript">
       <!-- transcript text is here but not retrieved properly -->
       </pre>
     </div>
   </div>

My only two thoughts were the following:

The function isn't capable of retrieving multiple elements at once. I updated the selectors to try only the transcript text and it still didn't work, so this doesn't seem to be the case.
The section is not entirely loaded when the retrieval is done. This doesn't seem to be the case either since its parent element is retrieved correctly. Getting output at the wrong time was an issue earlier with this task, which is why I updated it to be asynchronous. The output gathering works correctly in terms of the order in which events occur.

When I look through the Chrome console and select the DOM elements of each iframe I can see the text properly, it's just not being pulled properly even though its parent seems to have no problem. If anyone has ideas as to what might be occurring here, any insight would be greatly appreciated.

I'm not sure if it's the whole problem, but as a first step you need to change `var i` to `let i` so that `viewLinks[i]` will get the correct value of `i`. — Barmar, Jul 24 '21 at 17:41
I've updated it and still don't see a change. I'm not sure why my question was closed since it doesn't appear to be a scoping issue. As I mentioned in point 1 below the HTML snippet, the selector directly before the one I'm having trouble with works fine while being in the same block. It would seem to me that if scope issues were involved that one wouldn't work either, but perhaps I'm misunderstanding. — Evan Carlstrom, Jul 24 '21 at 18:01
Is the href of the iframe in the same domain as the main window? If not, you won't be able to access its contents. — Barmar, Jul 24 '21 at 18:07
Are you sure the selector is correct? If it can get `.transcript-text` contents it should be able to get `#transcript` as well. — Barmar, Jul 24 '21 at 18:10
I'm not sure why you need to get both. `#transcript` is inside `.transcript-text`, so `transcriptTextHeader` should include the `#transcript` contents. — Barmar, Jul 24 '21 at 18:12
Is `#transcript` filled in using JavaScript? `dataWindow.onload` doesn't wait for that JS to run. — Barmar, Jul 24 '21 at 18:13
You might be able to use a `MutationObserver` to do something when `#transcript` is filled in. But that will run asynchronously, you can't easily wait for it in the loop. — Barmar, Jul 24 '21 at 18:14
I looked at the file responsible for filling in the transcript information, and ```.transcript-text``` is being filled by a PHP variable while ```#transcript``` is in a JavaScript function as you've mentioned, so that would seem to be responsible for the difference in being able to access them. Thanks for the help thus far, that's something I hadn't noticed before. Is ```MutationObserver``` the way to try and solve this then? I'll look into that and see if I can implement it. — Evan Carlstrom, Jul 24 '21 at 18:19
`MutationObserver` allows you to create a handler that runs when a DOM element is modified. So it's the way to wait for some other JS to update the element. — Barmar, Jul 24 '21 at 18:22
Thanks, I'll give that a try later today and see how it goes. — Evan Carlstrom, Jul 24 '21 at 18:31
After some fiddling around with it I am finally able to get the text working properly with MutationObserver, so a huge thanks to you @Barmar for pointing me in the right direction. Had no idea this tool existed and it's been quite straightforward to implement. This is my first dev job coming from an unrelated industry so I'm looking at all of these challenges as opportunities to fill in knowledge gaps. — Evan Carlstrom, Jul 26 '21 at 16:50

DOM selection in iframes not properly returning text content

0 Answers0