2

I am trying to get all visible text from a Tableau view using Selenium. I define all visible text as any text that can be searched using any browser's search functionality (i.e. Ctrl+F).

I have already searched many other answers to related questions but none of them worked for my case. I tried the top answer from here. It doesn't work for me from the very start since my browser.page_source contains no visible text. Here are the contents of my browser.page_source:

<!DOCTYPE html><html xmlns:ng="" xmlns:tb=""><head><style type="text/css">@charset "UTF-8";[ng\:cloak],[ng-cloak],[data-ng-cloak],[x-ng-cloak],.ng-cloak,.x-ng-cloak,.ng-hide:not(.ng-hide-animate){display:none !important;}ng\:form{display:block;}.ng-animate-shim{visibility:hidden;}.ng-anchor{position:absolute;}</style><meta charset="UTF-8"><meta http-equiv="X-UA-Compatible" content="IE=edge"><meta name="viewport" content="width=1024, maximum-scale=1.3"><meta name="apple-itunes-app" content="app-id=434633927"><meta name="format-detection" content="telephone=no"><script>var BuildId = '9qu3thidy901n388pewixusor';
var StaticAssetsUrlPrefix = '';</script><link rel="stylesheet" type="text/css" href="vizportal.css?9qu3thidy901n388pewixusor"><script src="/javascripts/api/tableau-2.1.0.min.js?9qu3thidy901n388pewixusor"></script><script src="vizportalMinLibs.js?9qu3thidy901n388pewixusor"></script><script src="vizportal.min.js?9qu3thidy901n388pewixusor"></script></head><body class="tb-body"><div ng-app="VizPortalRun" id="ng-app" tb-window-resize="" class="tb-app ng-scope"><!-- uiView:  --><div ui-view="" class="tb-app-inner ng-scope"></div><span class="ng-isolate-scope"><div class="tb-toaster tb-enable-selection" data-reactid=".0"></div></span><script type="text/ng-template" id="inline_stackedElement.html"><div tb-window-resize tb-left="left" tb-top="top" tb-right="right" tb-bottom="bottom" tb-visible="visible" class="tb-absolute"></div></script><!-- ngRepeat: stackedElement in stackedElements --><span props="stackedComponentsProps" class="ng-isolate-scope"><div data-reactid=".1"></div></span></div></body></html>

Also tried the top answer here. Obviously, this didn't work as there's no text inside the body as you can see in the page source above.

What is the correct way to get the visible text in these circumstances?

Eduard Florinescu
  • 16,747
  • 28
  • 113
  • 179
Max Mikhaylov
  • 772
  • 13
  • 32

1 Answers1

1

As I keep saying pages_source gets the source but is not the same, Inspect Element inspects the DOM, the source page although is practically the original seed page for the DOM, the DOM can dynamically change and usually changes by JS code, sometimes quite dramatically. Also you will notice that Inspect Element shows the shadow elements which the source show not.

To see how dramatic is the difference visit chrome://settings/ and click Inspect element and then look at the View page source and compare.

So you need to take what you need from the DOM to do that you could iterate through all the tags and get textContent This is the JS snippet:

page =""; var all = document.getElementsByTagName("*"); for (tag of all) page = page + tag.textContent; 

or in selenium/python:

import selenium
from selenium import webdriver
driver = webdriver.Chrome()

driver.get("http://ranprieur.com")
pagetext = driver.execute_script('page =""; var all = document.getElementsByTagName("*"); for (tag of all) page = page + tag.textContent; return page;')

enter image description here

Eduard Florinescu
  • 16,747
  • 28
  • 113
  • 179
  • I know that page source doesn't show any dynamic content, just didn't know any other way to do this. I tried to execute your code in Python and got the following exception: `selenium.common.exceptions.WebDriverException: Message: {"errorMessage":"Expected token 'in'","request":{"headers": ... *lots of header info*... }} Screenshot: available via screen` – Max Mikhaylov Feb 20 '18 at 00:23
  • @MaxLawnboy Run `Ctrl+Shift+J` Try pasting `page =""; var all = document.getElementsByTagName("*"); for (tag of all) page = page + tag.textContent; console.log(page);` Do you get the text you want ? – Eduard Florinescu Feb 20 '18 at 01:16
  • @MaxLawnboy I think the error might not be related to my code I ran the code see update and for me it works just fine – Eduard Florinescu Feb 20 '18 at 01:29
  • Yes, you are correct. I get the expected result if I run your script from the browser console. I guess the error I got when accessing the webpage using Selenium webdriver is due to authentication issues. Will need to figure that out next. Thanks for your help! – Max Mikhaylov Feb 20 '18 at 01:33
  • @MaxLawnboy You could use profile folders and login manually, and login will be kept in cookies etc, see: https://stackoverflow.com/a/48873573/1577343 – Eduard Florinescu Feb 20 '18 at 01:37