0

I am trying to scrape intraday prices for a company, using this website:Enel Intraday

When the website pulls the data, it splits them into few hundreds pages, which makes it very time consuming to pull the data from. Using insomnia.rest (for the first time), i have been trying to play with the URL GET or try and find the actual javascrip function that returns these table values but without success.

Having inspected the search button, i find that the JS function is called "searchIntraday" and use a form as input called "intraday_form".

inspect Trova button

I am basically trying to get the following data in 1 call rather having to go through all tab pages, so a full day would look like this:

Time    Last Trade Price    Var %   Last Volume Type
5:40:49 PM  7.855   -2.88   570 AT
5:38:17 PM  7.855   -2.88   300 AT
5:37:10 PM  7.855   -2.88   290 AT
5:36:06 PM  7.855   -2.88   850 AT
5:35:56 PM  7.855   -2.88   14,508,309  UT
5:29:59 PM  7.872   -2.67   260 AT
5:29:59 PM  7.871   -2.68   4,300   AT
5:29:59 PM  7.872   -2.67   439 AT
5:29:59 PM  7.872   -2.67   3,575   AT
5:29:59 PM  7.87    -2.7    1,000   AT
5:29:59 PM  7.87    -2.7    1,000   AT
5:29:59 PM  7.87    -2.7    1,000   AT
5:29:59 PM  7.87    -2.7    4,000   AT
5:29:59 PM  7.87    -2.7    300 AT
5:29:59 PM  7.87    -2.7    2,000   AT
5:29:59 PM  7.87    -2.7    200 AT
5:29:59 PM  7.87    -2.7    400 AT
5:29:59 PM  7.87    -2.7    500 AT
5:29:59 PM  7.872   -2.67   1,812   AT
5:29:59 PM  7.872   -2.67   5,000   AT

..................................................

Time    Last Trade Price    Var %   Last Volume Type
9:00:07 AM  8.1 0.15    933,945 UT

which for that day is iterating from page 1 to page 1017!

I looked at the below page for help:

JS Scrape article

Stackflow similar issue with answer

Screen copy of Insomnia report

Je Je
  • 508
  • 2
  • 8
  • 23
  • Can you explain exactly what data you are trying to get? – Cohan Jan 31 '20 at 20:49
  • I further edited question. Tx – Je Je Jan 31 '20 at 21:16
  • I updated my answer a bit. But you're probably going to have to iterate through each page. Good news is that the computer doesn't mind churning while you get yourself a cup of coffee. – Cohan Jan 31 '20 at 21:21
  • well the problem is that if i want to do that for 30 stocks, my computer take more that the day to do it, and then next day data gets wipes out... I was basically trying to 'hack' the query (without success so far), in order to shorten teh process – Je Je Jan 31 '20 at 21:28

1 Answers1

0

The data doesn't appear to be generated by javascript, but rather by loading pages. The image below is the response I get when I load the link below. You can see that the location of the request matches the location on the page and that the HTML for the table is sent along with the page response.

The HTML in the response indicates that the pages are generated on the server side rather than the client side. Unfortunately, unless you find a way where you can browse and see all the results you want in one shot, you're going to have to iterate through each page. If you do manage to find a magic url, you can just process that one instead.

https://www.borsaitaliana.it/borsa/azioni/contratti.html?isin=IT0003128367&lang=en&page=10

enter image description here

I decided to give it a whirl to see what kind of performance I could get. Below is a complete script that iterates through the first 100 pages.

import pandas as pd
import requests

url = "https://www.borsaitaliana.it/borsa/azioni/contratti.html?isin=IT0003128367&lang=en&page="

df = pd.concat([
    pd.read_html(requests.get(url + str(page)).content)[0] 
    for page in range(100)
])

df.to_csv('enel.csv', index=False)

Running it on my machine, it took 1.25 minutes for 100 pages.

$ time python scrape.py 

real    1m16.914s
user    0m4.039s
sys 0m0.729s

This would be about 15 minutes per stock. I guess that's 7.5 hours for 30 stocks assuming they're all about the same length. You could run that overnight and it will be ready for you in the morning.

Cohan
  • 4,384
  • 2
  • 22
  • 40
  • I have edited question with the JS function i think i am after. The idea is to findout what parameters i can add to the query, including number of rows to show in a page, fromdate, todate. That is if it is possible, by editing the get query or the JS function parameters...Tx – Je Je Jan 31 '20 at 19:56