0

I would like to save into PDF books like this one to PDF https://kcenter.korean.go.kr/repository/ebook/culture/SB_step3/index.html that shows a book page by page.

How to do it?

The only thing that I managed so far is to print page by page into a pdf, and then combine separate pdf pages.

Is there a way to do it automatically in Python or other scripts?

1 Answers1

1

You can download the document images directly with requests and save to PDF with PIL. For example:

import requests
from PIL import Image # pip install Pillow
from io import BytesIO

pdf_path = "doc.pdf"
url = 'https://kcenter.korean.go.kr/repository/ebook/culture/SB_step3/assets/page-images/page-113088-{}.jpg'

images = [
    Image.open(BytesIO(requests.get(url.format(f'{p:>04}'), verify=False).content))
    for p in range(1, 4)  # <-- increase number of pages here (now it will save first 3 pages)
]

# borrowing from this answer: https://stackoverflow.com/a/47283224/10035985
images[0].save(
    pdf_path, "PDF" ,resolution=100.0, save_all=True, append_images=images[1:]
)

The resulting doc.pdf opened in Firefox:

enter image description here

Andrej Kesely
  • 168,389
  • 15
  • 48
  • 91