1

I'm trying to scrape data https://www.bps.go.id/indicator/3/1/25/inflasi-umum-.html from 1995-to this day,is there any way i can do it? i'm stuck because each year have spesific table and different html. Thank you in advance table visual

Sajan
  • 1,247
  • 1
  • 5
  • 13
adinda aulia
  • 183
  • 3
  • 12

1 Answers1

2

First, extract the options value from the select tag to get the url for each year :

import requests
from bs4 import BeautifulSoup
import pandas as pd

baseUrl = "https://www.bps.go.id"
dateFrom = 1995
dateTo = 2019

#get the options 
r = requests.get(f"{baseUrl}/indicator/3/1/25/inflasi-umum-.html")
soup = BeautifulSoup(r.text, "html.parser")
years = dict([
    (t.text, t["value"]) 
    for t in soup.find("select").findAll("option") 
    if t.get("value")
])

And then iterate through your range for each year, and use pandas to extract the table so you have a dictionnary with key as year and Dataframe as value :

#iterate through years
data = {}
ranges = range(dateFrom, dateTo + 1)
for n in ranges:
  print(f"get data for year {n}")
  r = requests.get(f"{baseUrl}{years[str(n)]}")
  table = pd.read_html(r.text)
  data[str(n)] = table[2]

print(data)

Try this on repl.it

Bertrand Martel
  • 42,756
  • 16
  • 135
  • 159
  • thank you so much! can i ask,why year have to be dict?? can i change it to something else? – adinda aulia Oct 02 '20 at 01:43
  • In the code bave years is like `{ "2019": "/......." }` so that you can use years["2019"] in the loop to get the path value directly but you can also make an array if you want – Bertrand Martel Oct 02 '20 at 01:47
  • okay thank! one more question,do you have an idea to delete the very first output, its because i want to convert it to csv .sorry for bother you =>[ 0 1 0 DATA SENSUS NaN, 0 1 2 3 0 NaN Facebook NaN Instagram 1 NaN Twitter NaN Youtube – adinda aulia Oct 02 '20 at 02:03
  • 1
    I've updated the code above with table[2] to get only the table without the header – Bertrand Martel Oct 02 '20 at 02:09
  • umm..sorry can i ask one more question?,i've been trying to convert dict to dataframe,but it always error same goes with saving to csv,can you help me once again – adinda aulia Oct 02 '20 at 04:26
  • 1
    See [this](https://stackoverflow.com/q/18837262/2614364) – Bertrand Martel Oct 02 '20 at 10:39