Web scraping several a href

Question

I would like to scrape this page with Python: https://statusinvest.com.br/acoes/proventos/ibovespa.

With this code:

import requests
from bs4 import BeautifulSoup as bs

URL = "https://statusinvest.com.br/acoes/proventos/ibovespa"

page = 1
req = requests.get(URL+str(page))
soup = bs(req.text, 'html.parser')
container = soup.find('div', attrs={'class','list'})
dividends = container.find('a')

for dividend in dividends:
  links = dividend.find_all('a')
 
 print(links)

But it doesn't return anything.

Can someone help me please?

Welcome to SO! Please see: [Why is “Can someone help me?” not an actual question?](http://meta.stackoverflow.com/q/284236). What data are you trying to get here? Almost certainly, the data you want is being injected via JS. See [Web-scraping JavaScript page with Python](https://stackoverflow.com/questions/8049520/web-scraping-javascript-page-with-python) — ggorlen, Apr 05 '21 at 00:43
Have you looked at the source code for this page? The data is all stored in a single hidden `` tag called `result`. Javascript code expands that on the fly to create the page. Since you aren't executing the Javascript, you won't see that. However, you may be able to get what you want just by reading that one `` tag. — Tim Roberts, Apr 05 '21 at 00:51

pb36 · Accepted Answer · 2021-04-06T18:31:04.790

Edited: you can see the below updated code to access any data you mentioned in the comment, you can modify according to your needs as all the data on that page is inside data variable.

Updated Code:

import json
import requests
from bs4 import BeautifulSoup as bs

url = "https://statusinvest.com.br"

links = []
req = requests.get(f"{url}/acoes/proventos/ibovespa")
soup = bs(req.content, 'html.parser')
data = json.loads(soup.find('input', attrs={'id': 'result'})["value"])
print("Date Com Data")
for datecom in data["dateCom"]:
    print(f"{datecom['code']}\t{datecom['companyName']}\t{datecom['companyNameClean']}\t{datecom['companyId']}\t{datecom['companyId']}\t{datecom['resultAbsoluteValue']}\t{datecom['dateCom']}\t{datecom['paymentDividend']}\t{datecom['earningType']}\t{datecom['dy']}\t{datecom['recentEvents']}\t{datecom['recentEvents']}\t{datecom['uRLClear']}")
print("\nDate Payment Data")
for datePayment in data["datePayment"]:
    print(f"{datePayment['code']}\t{datePayment['companyName']}\t{datePayment['companyNameClean']}\t{datePayment['companyId']}\t{datePayment['companyId']}\t{datePayment['resultAbsoluteValue']}\t{datePayment['dateCom']}\t{datePayment['paymentDividend']}\t{datePayment['earningType']}\t{datePayment['dy']}\t{datePayment['recentEvents']}\t{datePayment['recentEvents']}\t{datePayment['uRLClear']}")
print("\nProvisioned Data")
for provisioned in data["provisioned"]:
    print(f"{provisioned['code']}\t{provisioned['companyName']}\t{provisioned['companyNameClean']}\t{provisioned['companyId']}\t{provisioned['companyId']}\t{provisioned['resultAbsoluteValue']}\t{provisioned['dateCom']}\t{provisioned['paymentDividend']}\t{provisioned['earningType']}\t{provisioned['dy']}\t{provisioned['recentEvents']}\t{provisioned['recentEvents']}\t{provisioned['uRLClear']}")

Seeing to the source code of that website one could fetch the json directly and get your desired links follow the below code.

Code:

import json
import requests
from bs4 import BeautifulSoup as bs

url = "https://statusinvest.com.br"

links=[]
req = requests.get(f"{url}/acoes/proventos/ibovespa")
soup = bs(req.content, 'html.parser')
data = json.loads(soup.find('input', attrs={'id': 'result'})["value"])
for datecom in data["dateCom"]:
    links.append(f"{url}{datecom['uRLClear']}")
for datePayment in data["datePayment"]:
    links.append(f"{url}{datePayment['uRLClear']}")
for provisioned in data["provisioned"]:
    links.append(f"{url}{provisioned['uRLClear']}")

print(links)

Output:

Let me know if you have any questions :)

Great solution. But in fact i want to extract all the values of the datecom, datepayment and provisioned. Is it possible? — edoski, Apr 05 '21 at 23:28
Looks like you misunderstood you can see the code it fetch all 3 types links you can run and see number of links found on website and outputted by bot currently it's 42. Let me know if I am wrong at some place in this. — pb36, Apr 06 '21 at 04:07
Yes. I made the wrong question, sorry. In fact i want to extract all the data from the respective field: stock, value of dividend, data com, payment, type and DY. — edoski, Apr 06 '21 at 16:47
Is it possible put customized calendar filters inside of the site? Because i want to extract like all the data since year 2000 — edoski, Apr 06 '21 at 23:55
check this [URL](https://statusinvest.com.br/acao/getearnings?IndiceCode=ibovespa&Filter=&Start=1999-12-31&End=2021-05-07) it has data from start date 1999-12-31 to last date mentioned on the network tab it has same JSON with all the data you can modify a bot to call this URL. Let me know if you have any other question. :) — pb36, Apr 07 '21 at 04:58

Web scraping several a href

1 Answers1