2

I have used this post: How to scrape a public tableau dashboard? to get the Data Dictionary for this workbook: https://tableau.gatech.edu/t/GT/views/GTCovid-19Tracking/Covid-19DataTable using the code:

import requests
from bs4 import BeautifulSoup
import json
import re

url = "https://tableau.gatech.edu/t/GT/views/GTCovid-19Tracking/Covid-19DataTable"
r = requests.get(
    url,
    params= {
        ":embed":"y",
        ":showAppBanner":"false",
        ":showShareOptions":"true",
        ":display_count":"no",
        "showVizHome": "no"
    }
)

soup = BeautifulSoup(r.text, "html.parser")
tableauData = json.loads(soup.find("textarea",{"id": "tsConfigContainer"}).text)

dataUrl = f'https://tableau.gatech.edu{tableauData["vizql_root"]}/bootstrapSession/sessions/{tableauData["sessionid"]}'
r = requests.post(dataUrl, data= {
    "sheet_id": 'OverTime Table',
})

dataReg = re.search('\d+;({.*})\d+;({.*})', r.text, re.MULTILINE)
info = json.loads(dataReg.group(1))
data = json.loads(dataReg.group(2))

print(data["secondaryInfo"]["presModelMap"]["dataDictionary"])

But I am not sure how to create a pandas dataframe from the data. The data dictionary obviously does not store it tabular, it uses the indices. Is there a module or a way to transform the dictionary into a dataframe or other tabular format?

Bill K
  • 79
  • 4

0 Answers0