1

Okay, so I am working on a manga (japanese comics) downloader. Japanese Comics are available online but you can only read them, if you wish to download them, you have to start saving image files by right clicking blah blah blah...

So, I was working on an alternative manga downloader that will download all the chapters of the manga as specified by you and then convert them to pdf.

I have completed the code for downloading the images and its working quite well, but the problem is in the pdf-conversion part.

here's my code

import requests
import urllib
import glob
from bs4 import BeautifulSoup
import os
from fpdf import FPDF

def download_image(url, path):
    r = requests.get(url, stream=True)
    if r.status_code == 200:
        with open(path, 'wb') as f:
            for chunk in r:
                f.write(chunk)


start_chapter = int(input("Enter Starting Chapter: "))
end_chapter = int(input("Enter Ending Chapter: "))

chapters = range(start_chapter, end_chapter + 1)
chapter_list = []

for chapter in chapters:
    chapter_list.append("https://manganelo.com/chapter/read_one_piece_manga_online_free4/chapter_" + str(chapter))

for URL in chapter_list:
    r = requests.get(URL)

    soup = BeautifulSoup(r.text, 'html.parser')
    images = soup.findAll('img')
    for i in images:
        url = i.attrs["src"]
        os.makedirs(url.split('/')[-2], exist_ok=True)
        download_image(url, os.path.join(url.split('/')[-2], url.split('/')[-1]))

pdf = FPDF()
imageList = glob.glob("*")
for image in imageList:
    pdf.add_page()
    pdf.image(image, 10, 10, 200, 300)
pdf.output("One Piece Chapter", "F")

So, any suggestions how i can fix this error:

raise RuntimeError('FPDF error: '+msg) RuntimeError: FPDF error: Unsupported image type: chapter_1_romance_dawn
  • You save your files without common file extensions, e.g. `.jpg, .png`. Therefore `pdf.image(...` could not guess from the filename. Add the file type, e.g. `pdf.image(..., type = 'PNG')`. Read [FPDF for Python](https://pyfpdf.readthedocs.io/en/latest/reference/image/index.html) – stovfl Nov 02 '18 at 11:38
  • @stovfl He is saving the files with the extension... `url.split('/')[-1]` will return name and extension as is it part of the url. – Fabian Nov 02 '18 at 12:35

1 Answers1

0

First of all this is a very nice idea.

The error will occurs because the image list path is wrong.
You are storing the jpgs in the folder (chaptername).
Everything you have to do is give the correct path to FPDF.

I created a set to avoid duplications. Then i removed the "images" and "icon" folder -> maybe you will use them ?

cchapter = set()
for URL in chapter_list:
    r = requests.get(URL)

    soup = BeautifulSoup(r.text, 'html.parser')
    images = soup.findAll('img')

    for i in images:
        url = i.attrs["src"]
        cchapter.add(url.split('/')[-2])
        os.makedirs(url.split('/')[-2], exist_ok=True)
        download_image(url, os.path.join(url.split('/')[-2], url.split('/')[-1]))

cchapter.remove('images')
cchapter.remove('icons')
chapterlist = list(cchapter)
print(chapterlist[0])

def sortKeyFunc(s):
    return int(os.path.basename(s)[:-4])

for chap in chapterlist:
    pdf = FPDF()
    imageList = glob.glob(chap + "/*.jpg")
    imageList.sort(key=sortKeyFunc)
    for image in imageList:
        pdf.add_page()
        pdf.image(image, 10, 10, 200, 300)
    pdf.output(chap + ".pdf", "F")

Finally i added a loop to create a pdf for each single folder...
Then naming the PDF to the chapters name...
You also miss in your ourput the extension (".pdf")...
This will work. :)

EDIT:

glob.glob will return the filelist not in correct order.

Reference: here

It is probably not sorted at all and uses the order at which entries appear in the filesystem, i.e. the one you get when using ls -U. (At least on my machine this produces the same order as listing glob matches).

Therefor you can use the filename (in our case given as a number) as a sortkey.

def sortKeyFunc(s):
    return int(os.path.basename(s)[:-4])

then add imageList.sort(key=sortKeyFunc) in the loop.

NOTE: Code is updated.

Fabian
  • 1,130
  • 9
  • 25