0

I want to print/get only visible text content {currently what the user is seeing} from any website.

I tried using multiple approaches and I am getting all the texts from the page but not the intended texts.

driver = webdriver.Chrome(chrome_options=options) #'CustomerProject-createCustomerProject&/Create'
url = "https://techcrunch.com/"
driver.get(url)
element = driver.find_element_by_xpath(r"//body")
driver.execute_script("return arguments[0].innerText", element) 

Is there any way to get only visible texts.

Note: If the solution is pure javascript, more than welcome.

Mithilesh
  • 223
  • 2
  • 13
  • 2
    Why are you spam tagging a [tag:javascript] here, i removed it – U13-Forward May 06 '19 at 05:37
  • if you see the code driver.execute_script() it takes javascript code, i think javascript is the only way to get the result. – Mithilesh May 06 '19 at 05:57
  • 1
    But that's not uncommon in selenium, so that's why only 731 questions have [tag:python] [tag:javasript] [tag:selenium] tags, https://stackoverflow.com/questions/tagged/python+javascript+selenium) – U13-Forward May 06 '19 at 05:59
  • Can you share your use case ? – cruisepandey May 06 '19 at 06:31
  • Thanks all, for your support. I found the solution: https://stackoverflow.com/questions/487073/how-to-check-if-element-is-visible-after-scrolling – Mithilesh May 07 '19 at 05:41

1 Answers1

0

Get the body element and use the .text method to get the text of that element.

Try this:

driver.get("https://techcrunch.com/")
element = driver.find_element_by_tag_name("body")
print(element.text)

If you are guessing that (opens in a new window) text is not visible in the document but present in the result that is because it is present on the page. If you try Ctrl + A and copy the text you would get the same result. You can even search for the text in the page using Ctrl+F.

The reason you are not seeing the text is that it is clipped using webkit-clip-path.

The clip-path CSS property creates a clipping region that sets what part of an element should be shown. Parts that are inside the region are shown, while those outside are hidden.

S Ahmed
  • 1,454
  • 1
  • 8
  • 14
  • Thanks for your reply. Still, that will give the entire text content, i want just the content which is visible to the user. There are contents, which is visible after scrolling, that is also printing using this method, which i don't intend – Mithilesh May 06 '19 at 05:46
  • 1
    Find which CSS classes are visible and which ones are not. Filter content according to that information. – Mika72 May 06 '19 at 05:49
  • The `element.text` is getting the visible text of the page. What part do you think it is showing extra that javascript won't show? – S Ahmed May 06 '19 at 06:15
  • i ran the code on this page, i am getting texts from top "stack overflow" to bottom "site design / logo.." here we can only see the footer after scrolling down, which is not currently visible but still printing, which i don't want. – Mithilesh May 06 '19 at 06:37
  • Both of the text you mentioned is visible on the page and inside the body tag. That means the code is doing what it supposed to do. Now if you are trying to get text, not from the whole page but only the part is shown on screen at the moment without scrolling then that is an entirely different question. But to the browser the whole scrollable page is visible. – S Ahmed May 06 '19 at 06:51
  • Thanks. I want the part which is shown on the screen, as I think the question is a little confusing, I will modify the question title. – Mithilesh May 06 '19 at 07:03