0

Hi I'm totally newbie to the computer progamming world. So I might ask stupid questions. I'm trying to build a web scraping tool using python to scrape some statistics from Korean Statistical Office(KOSIS). So this is How I did and it keeps return error saying "'NoneType' object has no attribute 'find'"


import csv
import requests
from bs4 import BeautifulSoup

url = "https://kosis.kr/statHtml/statHtml.do?orgId=101&tblId=DT_1K31002&conn_path=I2"

res = requests.get(url)
res.raise_for_status()
soup = BeautifulSoup(res.text, "lxml")

data_rows = soup.find("table", attrs = {"id" : "mainTable"}).find("tbody").find_all("tr")

print(data_rows.get_text())

I googled my problem and found out that the DOM in browser is different from the actual HTML source. So I went into view-source page(view-source:https://kosis.kr/statHtml/statHtml.do?orgId=101&tblId=DT_1K31002&conn_path=I2) and since I don't know anything about HTML, I ran it in codebeautify and found out that source code doesn't contain any of the number that I'm seeing? huh. Is there anyone who can teach me what's happening. Thanks!

yeji
  • 11

1 Answers1

0

I recommend you to use Puppeteer for web scraping (this uses Google Chrome behind the scenes), because many web pages uses javascript to manipulate the DOM after HTML page load. Therefore, the original DOM is not the same when the page is fully loaded.

There it is a link that I found https://rexben.medium.com/introduction-to-web-scraping-with-puppeteer-1465b89fcf0b

Dharman
  • 30,962
  • 25
  • 85
  • 135
matymad
  • 51
  • 2