0

This my script which I am getting Error 502 on requests

import scrapy
from scrapy.crawler import CrawlerProcess
import os
from scrapy_splash import SplashRequest
import base64

class MySpider(scrapy.Spider):
    name = 'screenshot'
    splash_args = {
            'html': 1,
            'png': 1,
            'headers': {'USER_AGENT':
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.130 Safari/537.36'}
        }

    def start_requests(self):
        for i in os.listdir('html'):
            url = f'file:///home/madboy/stack/email/html/{i}'
            yield SplashRequest(url, self.parse_result, endpoint='render.html', args=self.splash_args)
            break

    def parse_result(self, response):
        imgdata = base64.b64decode(response.data['png'])
        filename = 'some_image.png'
        with open(filename, 'wb') as f:
            f.write(imgdata)


process = CrawlerProcess(settings={
    'ROBOTSTXT_OBEY': False,
    'CONCURRENT_REQUESTS_PER_DOMAIN':20,
    'DOWNLOADER_MIDDLEWARES':
        {
        'scrapy_splash.SplashCookiesMiddleware': 723,
        'scrapy_splash.SplashMiddleware': 725,
        'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware': 810,
        },
    'AUTOTHROTTLE_ENABLED': False,
    'SPLASH_URL': 'http://localhost:8050'  # Splash on port 8050

})
process.crawl(MySpider)
process.start()

Hello, I have a folder filled with HTML files which I want to take screenshots of Mobile web view and Browser view, I keep getting 502 Errors on the requests I am making.

I changed the render.json to render.html but it didn't help, if there is an easier way of achieving this please tell me

goku
  • 183
  • 14
  • I assume you are running Splash using Docker. In that case, you should learn about Docker volumes: by default Docker containers (Splash) do not have access to your hard drive, you must indicate in the Docker command to run Splash which paths of your disk you want to mount in which parts of the container. – Gallaecio Jun 22 '20 at 19:04
  • @Gallaecio How do I do that please? – goku Jun 23 '20 at 00:39
  • 1
    https://stackoverflow.com/questions/23439126/how-to-mount-a-host-directory-in-a-docker-container – Gallaecio Jun 23 '20 at 08:42

0 Answers0