How can I webscrape if the source HTML doesn't contain the actual number?

Question

Hi I'm totally newbie to the computer progamming world. So I might ask stupid questions. I'm trying to build a web scraping tool using python to scrape some statistics from Korean Statistical Office(KOSIS). So this is How I did and it keeps return error saying "'NoneType' object has no attribute 'find'"


import csv
import requests
from bs4 import BeautifulSoup

url = "https://kosis.kr/statHtml/statHtml.do?orgId=101&tblId=DT_1K31002&conn_path=I2"

res = requests.get(url)
res.raise_for_status()
soup = BeautifulSoup(res.text, "lxml")

data_rows = soup.find("table", attrs = {"id" : "mainTable"}).find("tbody").find_all("tr")

print(data_rows.get_text())

I googled my problem and found out that the DOM in browser is different from the actual HTML source. So I went into view-source page(view-source:https://kosis.kr/statHtml/statHtml.do?orgId=101&tblId=DT_1K31002&conn_path=I2) and since I don't know anything about HTML, I ran it in codebeautify and found out that source code doesn't contain any of the number that I'm seeing? huh. Is there anyone who can teach me what's happening. Thanks!

Oh I'd better use selenium for scraping this web page. Thanks a lot! — yeji, Feb 24 '21 at 00:25

score 0 · Answer 1 · edited Feb 23 '21 at 13:48

0

I recommend you to use Puppeteer for web scraping (this uses Google Chrome behind the scenes), because many web pages uses javascript to manipulate the DOM after HTML page load. Therefore, the original DOM is not the same when the page is fully loaded.

There it is a link that I found https://rexben.medium.com/introduction-to-web-scraping-with-puppeteer-1465b89fcf0b

edited Feb 23 '21 at 13:48

Dharman

30,962
25
85
135

answered Feb 23 '21 at 13:42

matymad

51
2

How can I *webscrape* if the source HTML doesn't contain the actual number?

1 Answers1

How can I webscrape if the source HTML doesn't contain the actual number?