I have an .xlsx
file that I want to download with Python. If I click on the following URL https://www.science.org/doi/suppl/10.1126/science.aad0501/suppl_file/aad0501_table_s5.xlsx it automatically downloads it with no problems. However, the following code
import requests
url = "https://www.science.org/doi/suppl/10.1126/science.aad0501/suppl_file/aad0501_table_s5.xlsx"
with open("this_is_a_test.xlsx", "wb") as f:
r = requests.get(url)
f.write(r.content)
print(r.ok)
outputs True
and downloads the HTML page instead of the xlsx file. What is even more frustrating is that the same code worked perfectly fine before but for some reason changed its behaviour in the last 24h.
This thread and this thread discuss similar problems, however in both cases there is a login barrier which in my case is not present.
EDIT 1: After executing the code above and typing head this_is_a_test.xlsx
in my terminal, this is the output I get:
<!DOCTYPE html>
<html lang="en" class="pb-page" data-request-id="fe043004-5c5a-4d2e-a323-cc9b39aa3339"><head data-pb-dropzone="head"><meta name="pbContext" content=";wgroup:string:Publication Websites;page:string:Cookie Absent;website:website:aaas-site" />
<script>AAASdataLayer={"page":{"pageInfo":{"pageTitle":"","pageURL":"https://www.science.org/action/cookieAbsent"},"attributes":{}},"user":{}};if(AAASdataLayer&&AAASdataLayer.user){let match=document.cookie&&document.cookie.match(/(?:^|; )consent=([^;]*)/);if(match){let jsonObj=JSON.parse(decodeURIComponent(match[1]));AAASdataLayer.user.cookieConsent=jsonObj.Marketing?'true':'false';}}</script> <link type="text/css" rel="stylesheet" href="/pb-assets/css/local-1639500397097.css">
<title>AAAS</title>
<meta charset="UTF-8">
<meta name="robots" content="noarchive,noindex,nofollow" />
<meta property="og:title" content="AAAS" />
<meta property="og:type" content="Website" />
<meta property="og:site_name" content="AAAS" />
<meta name="viewport" content="width=device-width,initial-scale=1,maximum-scale=1, user-scalable=0" />
EDIT 2: Okay, so apparently the code downloads the Excel file when executed once, but changes the behaviour when executed a second time. Downloading it manually (by clicking on the link) still works. So, I guess there might still be a workaround for it?