Web-scraping w/ Python: make my web scraping code faster?

Question

I would like to scrape two tables from 2 links. My code is:

import pandas as pd
import xlwings as xw
from datetime import datetime

def last_row(symbol, name):

    # Function that outputs if the last row of the df should be deleted or not, 
    # based on the 2 requirements below.

    requirements = [symbol.lower()=="total", name.isdigit()]
    return all(requirements)
    
    # return True, if the last row should be deleted.
    # The deletion will be performed in the next function.

def get_foreigncompanies_info():
    df_list = []
    links = ["https://stockmarketmba.com/nonuscompaniesonusexchanges.php",
              "https://stockmarketmba.com/listofadrs.php"]
    for i in links:

        #Reads table with pandas read_html and only save the necessary columns.

        df = pd.read_html(i)[0][['Symbol', 'Name', 'GICS Sector']] 
        if last_row(df.iloc[-1]['Symbol'], df.iloc[-1]['Name']):

            # Delete the last row

            df_list.append(df.iloc[:-1])
        else:

            # Keep last row

            df_list.append(df)
    return pd.concat(df_list).reset_index(drop=True).rename(columns={'Name': 'Security'})

def open_in_excel(dataframe):  # Code to view my df in excel.
    xw.view(dataframe)
    
if __name__ == "__main__":
    start = datetime.now()
    df = get_foreigncompanies_info()
    print(datetime.now() - start)
    open_in_excel(get_foreigncompanies_info())

It took seconds to perform the code.

I would like to make the code run faster (in a way, that doesn't make too much unnecessary request). My idea is to download the table as csv, since in the website, there is a "download csv" button.

How could I download the csv with python?

I have inspected the button but couldn't find the url for it. (If you can find it, please also describe how you found it perhaps with a "inspect"-screenshot.)

Or is there any other faster way to download the tables?

Thank you for any pointer :-)

If all you want is to download the csv, this answer may be helpful. https://stackoverflow.com/questions/16283799/how-to-read-a-csv-file-from-a-url-with-python — Forensic_07, Mar 29 '21 at 20:18
It is not that straightforward, because I could not find the link to download the csv. I have already inspected the website with no success. — gunardilin, Mar 29 '21 at 20:43

score 1 · Answer 1 · answered Apr 22 '21 at 14:41

1

You could use selenium to automate clicking on the button. It's not hard but a lot of effort for something so trivial. I don't like scraping but sometimes it's all we have, right?

answered Apr 22 '21 at 14:41

antigraviton

36
2

Thank you for your reply. I tried to use selenium, but after "inspect"ing it, I could not find the url to the csv. I hope, you can give me more pointer here... – gunardilin Apr 22 '21 at 15:01

Web-scraping w/ Python: make my web scraping code faster?

1 Answers1