1

I have a web link which downloads an excel file directly. It opens a page writing "your file is downloading" and starts downloading the file.

Is there any way i can automate it using requests module ?

I am able to do it with selenium but i want it to run in background so i was wondering if i can use request module.

I have used request.get but it simply gives the text i.e "your file is downloading" but somehow i am not able to get the file.

Amit Raj
  • 11
  • 1
  • 3
  • 2
    Do you have to use python? the `wget` command seems like a much better fit for this. – John Gordon Jul 12 '19 at 03:33
  • if page use JavaScript to start downloading (redirect to real link) then you can't do this with requests which can't run JavaScript. You may only find url used by JavaScript and use it with requests. – furas Jul 12 '19 at 03:34

2 Answers2

1

This Python3 code downloads any file from web to a memory:

import requests
from io import BytesIO

url = 'your.link/path'

def get_file_data(url):
    response = requests.get(url)
    f = BytesIO()
    for chunk in response.iter_content(chunk_size=1024):
        f.write(chunk)
    f.seek(0)
    return f

data = get_file_data(url)

You can use next code to read the Excel file:

import pandas as pd

xlsx = pd.read_excel(data, skiprows=0)
print(xlsx)
0

It sounds like you don't actually have a direct URL to the file, and instead need to engage with some javascript. Perhaps there is an underlying network call that you can find by inspecting the page traffic in your browser that shows a direct URL for downloading the file. With that you can actually just read the excel file URL directly with pandas:

import pandas as pd

url = "https://example.com/some_file.xlsx"
df = pd.read_excel(url)
print(df)

This is nice and tidy, but if you really want to use requests (or avoid pandas) you can download the raw file content as shown in this answer and then use the pyexcel_xlsx package's get_xlsx function to read it without any pandas involvement.

totalhack
  • 2,298
  • 17
  • 23