1

How do I get the following data from this excel link: https://www.centrodeinformacao.ren.pt/userControls/GetExcel.aspx?T=CRG&P=02-10-2020&variation=PT into Python?

Ismael Padilla
  • 5,246
  • 4
  • 23
  • 35
mxs
  • 25
  • 6

1 Answers1

0

The easiest way is to use pandas.read_html():

import pandas as pd

df = pd.read_html('https://www.centrodeinformacao.ren.pt/userControls/GetExcel.aspx?T=CRG&P=02-10-2020&variation=PT')[0]
print(df)

Prints:

            0      1       2     3            4           5             6           7           8               9            10          11                12         13        14       15
0         Data   Hora  Carvão  Fuel  Gás Natural  Albufeiras  Fios de Água  Importação  Exportação  PRE Hidráulico  PRE Térmico  PRE Eólica  PRE Fotovoltaico  PRE Ondas  Bombagem  Consumo
1   02-10-2020  00:00      00    00         2176          06           990        3395          00             117         9371       41010                00       0000      5892    50700
2   02-10-2020  00:15      00    00         2028          05           995        3850          00             128         9154       41336                00       0000      6808    50219
3   02-10-2020  00:30      00    00         1888          05           426        4006          00             136         9047       42058                00       0000      6884    50252
4   02-10-2020  00:45      00    00         2300          05            01        4083          00             134         9020       42060                00       0000      7754    49402
..         ...    ...     ...   ...          ...         ...           ...         ...         ...             ...          ...         ...               ...        ...       ...      ...
92  02-10-2020  22:45      00    00         2524         144          2246        6809          00             315        10664       32559                00       0000        21    54947
93  02-10-2020  23:00      00    00         2272        1954          2132        7318          00             281        10584       32081                00       0000      3286    53083
94  02-10-2020  23:15      00    00         2104        1956          2037        7413          00             283        10525       31210                00       0000      3546    51721
95  02-10-2020  23:30      00    00         2320        1130          2092        7402          00             282        10640       30667                00       0000      3413    50845
96  02-10-2020  23:45      00    00         2244         763          1963        7891          00             288        10555       29770                00       0000      3507    49653

[97 rows x 16 columns]
Andrej Kesely
  • 168,389
  • 15
  • 48
  • 91
  • thanks Andrej Kesely. and after that can i import that table to mysql from python? How? – mxs Oct 12 '20 at 10:43
  • @mxs I would look here for start: https://stackoverflow.com/questions/16476413/how-to-insert-pandas-dataframe-via-mysqldb-into-database – Andrej Kesely Oct 12 '20 at 10:44
  • https://stackoverflow.com/questions/64282670/scrapy-can%c2%b4t-scrape-multiple-tables-at-once can you take a look at this also? thanks a lot – mxs Oct 12 '20 at 10:45
  • @mxs I'd love to help you with that, but I don't have experience with scrapy (only pandas/beautifulsoup). – Andrej Kesely Oct 12 '20 at 10:47
  • Andrej Kesely what about the following?https://stackoverflow.com/questions/64460619/pd-read-html-error-client-remote-disconnected – mxs Oct 21 '20 at 09:38
  • Andrej Kesely, why cant I extract anything from this link? It always gives me list index out of range df = pd.read_html('https://www.iesoe.eu/iesoe/ProxyServlet?fileName=FLW_INT_DD_AA_20200101&fileType=xls&idioma=es')[0] – mxs Oct 21 '20 at 13:44