Requests.get() function does not give same result as the webbrowser.open() function

Question

I have a url which I need to run in order for a refresh to happen. It will refresh the data cache and display the latest uploaded data in tableau server. The url is like this:

http://servername/views/workbookname/dashboard1?:refresh=yes

When I use the webbrowser library to open the url, the refresh is executed but I get a browser which is open. When I use requests to get the url, it does not refresh and gives me a Response of 200 which I assume is successful.

Anyone knows why it could happen? How can I silently use the webbrowser lib to open the url and close it afterwards or have the requests act as a webbrowser when doing the get function?

import webbrowser
url = 'http://servername/views/workbookname/dashboard1?:refresh=yes'
webbrowser.open(url)

import requests
url = "http://servername/views/workbookname/dashboard1?:refresh=yes"
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.76 Safari/537.36', "Upgrade-Insecure-Requests": "1","DNT": "1","Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8","Accept-Language": "en-US,en;q=0.5","Accept-Encoding": "gzip, deflate"}
html = requests.get(url,headers=headers)
print(html)

What is your final goal here? Just to refresh the data? Or to do some work with it afterwards? If so, what kind of work? Depending on your goal, the implementation could be entirely different. — Jack Taylor, Nov 05 '19 at 21:33
@JackTaylor Yes just to simulate the run of the webpage with the ?:refresh=yes — Jonathan Lam, Nov 05 '19 at 21:42
@RithinChalumuri No, I have tried it multiple times. No errors, I get the html text when I print the page_source. — Jonathan Lam, Nov 05 '19 at 21:45
[This post](https://community.tableau.com/thread/226546) on the Tableau Community Forums suggests that you should use the API to issue the Refresh command directly instead of through the browser via code. They do have a [python library](https://tableau.github.io/server-client-python/docs/) available that you can use (as opposed to JavaScript). — b_c, Nov 05 '19 at 21:50
When you open your url, `webbrowser` module launches your default browser that has your credentials/cookies so it works. Whereas, if your url needs any authentication or login to access, you'll have provide these in selenium session. For example, when you try opening your URL from incognito mode or using a different browser in your machine, does it work? — Rithin Chalumuri, Nov 05 '19 at 21:51
@RithinChalumuri Yes my default browser is already logged in. If I run the url manually incognito, it will ask for login credentials. — Jonathan Lam, Nov 05 '19 at 21:54
That's the reason its not working. Every Selenium session is running with no cookies/data from your other browser sessions. If you wanted to go down this route with api. You'll have to first write code using selenium to enter your credentials in the browser and then call the page. — Rithin Chalumuri, Nov 05 '19 at 21:55

score 1 · Answer 1 · answered Nov 05 '19 at 21:17

The reason why your browser opens up is simply because that is what webbrowser.open() is supposed to do, instead of sending an HTTP Request it opens the browser and puts in the URL. A possible solution would be using selenium instead of webbrowser because when I looked at it I didnt find a headless option for the package you are using yet. So here it is:

from selenium.webdriver import Chrome
from selenium.webdriver.chrome.options import Options

url = "<URL>"

chrome_options = Options()  
chrome_options.add_argument("--headless")

with Chrome(options=chrome_options) as browser:
     browser.get(url)

In case this solution is not acceptable because you need to use webdriver instead of selenium you would need to find a way to pass options to your browser instance. I didnt find a way with dir() or help() to pass this argument to webbrowser but if I find something I will add it.

Rithin Chalumuri · Accepted Answer · 2019-11-05T21:59:12.630

requests.get() simply returns the markup received from the server after 'GET' request without any further client-side execution.

Whereas in a browser context, there's a lot more that can be done on client-side javascript. I haven't looked at your page specifically, but there might be certain javascript code doing further processing.

Instead of web browser or requests you can use Selenium. You can read more about it here.

Selenium lets you browser pages like you do use the browser but also gives you the flexibility to automate + control actions on page with python code.

You could perhaps use Selenium Chrome Webdriver to load the page in the background. (Or you can use Firefox driver).

Go to chrome://settings/help check your current chrome version and download the driver for that version from here. Make sure to either keep the driver file in your PATH or the same folder where your python script is.

Try this:

from selenium.webdriver import Chrome # pip install selenium
from selenium.webdriver.chrome.options import Options

url = "http://servername/views/workbookname/dashboard1?:refresh=yes"

#Make it headless i.e. run in backgroud without opening chrome window
chrome_options = Options()  
chrome_options.add_argument("--headless")

# use Chrome to get page with javascript generated content
with Chrome(executable_path="./chromedriver", options=chrome_options) as browser:
     browser.get(url)
     page_source = browser.page_source

Note

When you open your URL, webbrowser module launches your default browser that already has your credentials/cookies cached. Whereas, if your URL needs any authentication or login to access, you'll have to provide these in when getting the page using selenium. Think of each selenium web driver session as an incognito session. Here's an example on how to simulate a login with web driver.

References:

selenium - chromedriver executable needs to be in PATH

I have looked at the html tag it returns, yes it returns the login page. When I inspect the code in chrome, I can see the inner tags but on the return (print source page in python) it is not there? Have you ever encountered this? The username and password tag is inside the missing tag on the output. — Jonathan Lam, Nov 06 '19 at 15:26
@JonathanLam, I've encountered something similar in the past. But in my case, it was because the login form was hidden and I needed to first click the Login button and then I would see the email/password fields. If you know the ID of both fields you can set the values directly by executing javascript. Example: https://stackoverflow.com/a/58737372/11914067 — Rithin Chalumuri, Nov 06 '19 at 19:39

Requests.get() function does not give same result as the webbrowser.open() function

2 Answers2