1

Im trying to scrape data from this website: https://www.rad.cvm.gov.br/ENETCONSULTA/frmGerenciaPaginaFRE.aspx?NumeroSequencialDocumento=102142&CodigoTipoInstituicao=2, but switching from "Demonstração do Resultado" to "Balanço Patrimonial Ativo" on the upper right box, the whole table is under the CSS selector "#ctl00_cphPopUp_tbDados" but I cant get the data using selenium webdriver, I think the table is dynamic and loads under a script, but I don't know other way to get this data. Here is the complete code so far:

cvm = input('Códigos CVM separados por vírgula: ')
lstcvm = list(map(str,cvm.split(',')))
for i in lstcvm:
    url="https://bvmf.bmfbovespa.com.br/cias-listadas/empresas-listadas/ResumoDemonstrativosFinanceiros.aspx?codigoCvm="+i+"&idioma=pt-br"
    driver = webdriver.Firefox()
    driver.get(url)
    dfp = driver.find_element(By.CSS_SELECTOR, "#ctl00_contentPlaceHolderConteudo_rptDocumentosDFP_ctl00_lnkDocumento")
    webdriver.ActionChains(driver).click(dfp).perform()
    time.sleep(10)
    tabs=driver.window_handles
    driver.switch_to.window(tabs[1])
    print(driver.current_url)
    box = driver.find_element(By.CSS_SELECTOR, "#cmbQuadro")
    box.send_keys(Keys.HOME, Keys.RETURN)
    time.sleep(1)
    driver.maximize_window()
    time.sleep(1) 
    balanco=driver.find_element(By.CSS_SELECTOR, "#ctl00_cphPopUp_tbDados").text
    balanco
    driver.switch_to.window(tabs[0])
    print(driver.current_url)
    print("Finalizado")

The sample input here is 9512

The portion of the code used trying to scrape the data is this one:

balanco=driver.find_element(By.CSS_SELECTOR, "#ctl00_cphPopUp_tbDados").text
undetected Selenium
  • 183,867
  • 41
  • 278
  • 352
Capuccino
  • 47
  • 1
  • 8

1 Answers1

1

Selecting Balanço Patrimonial Ativo and then to extract the data from the DFs Consolidadas / Balanço Patrimonial Ativo - (Reais Mil) table from the website you need to induce WebDriverWait for the visibility_of_element_located() and using DataFrame from Pandas you can use the following Locator Strategy:

Code Block:

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
import pandas as pd

driver.get("https://www.rad.cvm.gov.br/ENETCONSULTA/frmGerenciaPaginaFRE.aspx?NumeroSequencialDocumento=102142&CodigoTipoInstituicao=2")
Select(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "select#cmbQuadro")))).select_by_visible_text("Balanço Patrimonial Ativo")
WebDriverWait(driver, 20).until(EC.frame_to_be_available_and_switch_to_it((By.CSS_SELECTOR,"iframe#iFrameFormulariosFilho")))
data = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "table#ctl00_cphPopUp_tbDados"))).get_attribute("outerHTML")
df = pd.read_html(data)
print(df)

Console Output:

[                0                              1            2            3
0           Conta                      Descrição   31/12/2020   31/12/2019
1               1                    Ativo Total  987.419.000  926.011.000
2            1.01               Ativo Circulante  142.323.000  112.101.000
3         1.01.01  Caixa e Equivalentes de Caixa   60.856.000   29.714.000
4         1.01.02         Aplicações Financeiras    3.424.000    3.580.000
..            ...                            ...          ...          ...
61     1.02.03.03       Imobilizado em Andamento          NaN          NaN
62        1.02.04                     Intangível   77.678.000   78.489.000
63     1.02.04.01                    Intangíveis          NaN          NaN
64  1.02.04.01.01          Contrato de Concessão          NaN          NaN
65     1.02.04.02                       Goodwill          NaN          NaN

[66 rows x 4 columns]]
undetected Selenium
  • 183,867
  • 41
  • 278
  • 352