4

I'm working on scraping a private Tableau Dashboard from a vendor and cannot seem to select or use the embedded scrollbars that exist in tableau. I've attempted to scroll, scroll into view, and simply grabbing the scrollbar with javascript.

An example of the scrollbar I've encountered can be found at:

https://public.tableau.com/views/WorldIndicators-TableauGeneralExample/Story?%3Aembed=y&%3AshowVizHome=no&%3AshowTabs=y&%3Adisplay_count=y&%3Adisplay_static_image=y

the XPATH I am using is

/html/body/div[2]/div[3]/div[1]/div[1]/div/div[2]/div[4]/div/div/div/div/div[2]/div/div/div/div[1]/div[20]

I've attempted the options found here, here, and here.

I cannot seem to actually grab the scrollbar itself. The best I've been able to do is click the entire bar.

How can I advance this scrollbar to bring IDs into view as I iterate over them?

import os, sys, shutil, logging, os.path
from selenium import webdriver
from selenium.webdriver.support.select import Select
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import NoSuchElementException
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver import ActionChains
from selenium.webdriver.chrome.options import Options
from azure.storage.blob import BlockBlobService





url = 'https://public.tableau.com/views/WorldIndicators-TableauGeneralExample/Story?%3Aembed=y&%3AshowVizHome=no&%3AshowTabs=y&%3Adisplay_count=y&%3Adisplay_static_image=y'
    

PATH = "/Users/171644/python_tools/chromedriver"  #change this
options = Options()
driver = webdriver.Chrome(PATH,options=options)
wait = WebDriverWait(driver, 120)

driver.get(url)
time.sleep(5)
driver.fullscreen_window()
time.sleep(10)

element = driver.find_element_by_id('10671917940_0')
actions = ActionChains(driver)
actions.move_to_element(element).perform()
zabada
  • 67
  • 1
  • 10
  • What do you want to scrap all data from table or all ids? – MeT Jun 08 '22 at 20:05
  • 1
    @MeT I want to get all the data. I have all the IDs already. I need to make the scroll bar move down to access the next ID. – zabada Jun 09 '22 at 16:39

2 Answers2

1

This is not going to work because the element you are trying to access is located inside of an iframe from a different domain. You can read more on this on Same-Origin-Policy .

Additionally, there are many reasons why your approach will take a lot of time and be flaky here: Embedded tableau workbooks are rendered inside an iframe (you will have to locate each invididual iframe) and there's also asynchronous rendering taking place w/ AJAX calls; so you will deal with explicit waits a lot.

I would advise to use a scraping tool instead

I leave you a little code snippet in case you want to follow up on the latest.

from tableauscraper import TableauScraper as TS
url = "https://public.tableau.com/views/WorldIndicators-TableauGeneralExample/Story?%3Aembed=y&%3AshowVizHome=no&%3AshowTabs=y&%3Adisplay_count=y&%3Adisplay_static_image=y"

ts = TS()
ts.loads(url)
workbook = ts.getWorkbook()

for t in workbook.worksheets:
    print(f"worksheet name : {t.name}") #show worksheet name
    print(t.data) #show dataframe for this worksheet
wiz
  • 73
  • 7
  • I have reviewed TableauScraper. I was unable to find a way to access private, password protected dashboards with it. I need to access a password protected dashboard and have therefore been using Selenium. If you know of a way to do this, please let me know. – zabada Jun 09 '22 at 16:34
  • I am not familiar in how Tableau does authentication, based on that, you might be able to just set the auth header for the scraper. But again, it really depends on what they are using. – wiz Jun 10 '22 at 17:16
0

To use this code you need pip install pyautogui. With pyautogui you can move the mouse cursor over the table and then simulate scroll down with the mouse wheel, so that all the rows are loaded.

Important: in the last line we need row.get_attribute('innerText') instead of row.text, because .text is able to get only the text content of the visible elements.

import pyautogui

driver.get(url)
time.sleep(5)
table = driver.find_element(By.CSS_SELECTOR, 'div.tabZone-viz')
c = table.rect

# move mouse to the center of the table
pyautogui.moveTo(c['x']+c['width'], c['y']+c['height'])

# scroll to the bottom of the table
pyautogui.scroll(-9999)

# find the first cell of each row
rows = driver.find_elements(By.CSS_SELECTOR, 'div.tab-vizLeftSceneMargin div.tab-vizHeaderWrapper')

# print the content of the cells
[row.get_attribute('innerText') for row in rows]

Output

['Singapore',
 'Hong Kong SAR, C..',
 'New Zealand',
 'United States',
 ...
 'Congo, Rep.',
 'Central African Rep..',
 'Libya',
 'Chad']
sound wave
  • 3,191
  • 3
  • 11
  • 29