0

The code bellow gets me in a loop and prints a table of data from a website.

How can i get the data from this 'output' table into a new orgaized table?

I need to get the 'Códigos de Negociação' and the 'CNPJ' into this new table.

This is a sample of the scraped table

                        0                                               1
0          Nome de Pregão                                     FII ALIANZA
1   Códigos de Negociação                                          ALZR11
2                    CNPJ                              28.737.771/0001-85
3  Classificação Setorial  Financeiro e Outros/Fundos/Fundos Imobiliários
4                    Site                              www.btgpactual.com

This is the code

import pandas as pd

list = pd.read_html('http://bvmf.bmfbovespa.com.br/Fundos-Listados/FundosListados.aspx?tipoFundo=imobiliario&Idioma=pt-br')[0]
Tickers = list['Código'].tolist()

removechars = str.maketrans('', '', './-')
for i in Tickers:
    try:
        df = pd.read_html("http://bvmf.bmfbovespa.com.br/Fundos-Listados/FundosListadosDetalhe.aspx?Sigla="+i+"&tipoFundo=Imobiliario&aba=abaPrincipal&idioma=pt-br")[0]
        print(df)
    except:
        print('y')

And i would like to apply the removechars in the CNPJ, to clear it from dots, bars and dashes.

Expected result:

                   Código                                            CNPJ
0                 ALZR11                                   28737771000185
guialmachado
  • 506
  • 5
  • 17
  • You may want to use the Beautiful Soup library: https://www.crummy.com/software/BeautifulSoup/ – Arne Apr 13 '20 at 19:16
  • I dont belive its necessary, because the table is already printed using just pandas. I want to get elements with this DF and add to a new Df – guialmachado Apr 13 '20 at 19:18
  • 1
    Okay, then take a look here: https://stackoverflow.com/questions/3939361/remove-specific-characters-from-a-string-in-python – Arne Apr 13 '20 at 19:23

1 Answers1

0

This code worked for me

import pandas as pd

list = pd.read_html('http://bvmf.bmfbovespa.com.br/Fundos-Listados/FundosListados.aspx?tipoFundo=imobiliario&Idioma=pt-br')[0]
Tickers = list['Código'].tolist()
print(list)

CNPJ = []
Codigo = []
removechars = str.maketrans('', '', './-')
for i in Tickers:
    try:
        df = pd.read_html("http://bvmf.bmfbovespa.com.br/Fundos-Listados/FundosListadosDetalhe.aspx?Sigla="+i+"&tipoFundo=Imobiliario&aba=abaPrincipal&idioma=pt-br")[0]
        print(df)
        Codigo.append(df.at[1, 1])
        CNPJ.append(df.at[2,1])
        df2 = pd.DataFrame({'Codigo':Codigo,'CNPJ':CNPJ})
        CNPJ_No_S_CHAR = [s.translate(removechars) for s in CNPJ]
        df2['CNPJ'] = pd.Series(CNPJ_No_S_CHAR)
        print(df2)
    except:
        print('y')
guialmachado
  • 506
  • 5
  • 17