1

I have to scrape data from the tabulae workbook to csv file. https://public.tableau.com/views/2020_04_06_COVID19_India/Dashboard_India_Cases?:embed=y&:showVizHome=no&:host_url=https%3A%2F%2Fpublic.tableau.com%2F&:embed_code_version=3&:tabs=no&:toolbar=yes&:animate_transition=yes&:display_static_image=no&:display_spinner=no&:display_overlay=yes&:display_count=yes&publish=yes&:loadOrderID=0

I have tried the following but i am getting no output.

main.py

import requests
from bs4 import BeautifulSoup


 r = requests.get("https://public.tableau.com/views/2020_04_06_COVID19_India/Dashboard_India_Cases?:embed=y&:showVizHome=no&:host_url=https%3A%2F%2Fpublic.tableau.com%2F&:embed_code_version=3&:tabs=no&:toolbar=yes&:animate_transition=yes&:display_static_image=no&:display_spinner=no&:display_overlay=yes&:display_count=yes&publish=yes&:loadOrderID=0")

     soup = BeautifulSoup(r.content, "html.parser")

     for td in soup.findAll("table"):

     for a in td.findAll("tr"):
      print(a.find('td'))
  • I don't see any `` element in this page
    – Alexandre B. Apr 25 '20 at 07:51
  • But how to extract data from workbook the code I have written as for example for the normal site. – Vivian Mascarenhas Apr 25 '20 at 11:37
  • Data seems to be in `canvas`. You should make some search on how to retrieve Data from `canvas` using Python. There are similar topic like [this](https://stackoverflow.com/questions/48862653/how-to-retrieve-data-from-html-canvas-using-python) one or maybe this [one](https://stackoverflow.com/questions/30497537/scraping-graph-data-from-a-website-using-python). – Alexandre B. Apr 25 '20 at 14:45

1 Answers1

2

I've made this python tableau scraper library that lists worksheets and exports data into a pandas dataframe for each worksheet. For example, the following gets the table you're looking for :

from tableauscraper import TableauScraper as TS

url = "https://public.tableau.com/views/2020_04_06_COVID19_India/Dashboard_India_Cases"

ts = TS()
ts.loads(url)
dashboard = ts.getDashboard()

for t in dashboard.worksheets:
    #show worksheet name
    print(f"WORKSHEET NAME : {t.name}")
    #show dataframe for this worksheet
    print(t.data)

run this code on repl.it

Bertrand Martel
  • 42,756
  • 16
  • 135
  • 159
  • 1
    Thanks for the library! I just ran it on a couple of random pages - worked flawlessly! – Jack Fleeting Feb 23 '21 at 14:13
  • Does this also work with Tableau server (not public) dashboards? I tried using it on some of the dashboards that we published to our local Tableau server and keep getting the below error "AttributeError: 'NoneType' object has no attribute 'text'" from line 80 in TableauScraper.py. It originates from the call to "soup.find("textarea", {"id": "tsConfigContainer"}).text – vmachan Jan 13 '22 at 17:09
  • 2
    Are there any examples that use tableauserverclient library to login to a Tableau server (not public) and then use TableauScraper to get the data? @bertrand-martel – vmachan Jan 13 '22 at 18:47
  • @vmachan It doesn't work on authenticated tableau server as is, I've never worked on on authenticated tableau link but, if the authentication is passing a cookie header you could use `ts.session = requests.Session()` at the beginning, like this https://stackoverflow.com/questions/17224054/how-to-add-a-cookie-to-the-cookiejar-in-python-requests-library. Not sure if it'll work though if there are additional CSRF mecanism in place like [this](https://stackoverflow.com/questions/63025296/scraping-data-from-a-tableau-map/64179945#64179945) – Bertrand Martel Jan 17 '22 at 17:56