0

I am trying to run this script

import crawler

crawler.crawl(url="https://www.montratec.com",output_dir="crawling_test",method="rendered-all")

from this library: https://github.com/SimFin/pdf-crawler

but I am getting this error:

Expected browser binary location, but unable to find binary in default location, no 'moz:firefoxOptions.binary' capability provided, and no binary flag set on the command line

I already have Firefox installed and I am using Windows.

x89
  • 2,798
  • 5
  • 46
  • 110
  • How did you install crawler ? what is your python version ? – cruisepandey Jul 13 '21 at 07:03
  • Cloned it from here @cruisepandey https://github.com/SimFin/pdf-crawler – x89 Jul 13 '21 at 07:04
  • Can you do this instead `pip install crawler` ? also where is your firefox installed ? I mean in which location – cruisepandey Jul 13 '21 at 07:07
  • Did you try out this https://stackoverflow.com/questions/65318382/expected-browser-binary-location-but-unable-to-find-binary-in-default-location – cruisepandey Jul 13 '21 at 07:08
  • I am not using selenium in my code so don't know how to use the solutions suggested above? @cruisepandey – x89 Jul 13 '21 at 07:19
  • @cruisepandey I wanted to use crawler from the cloned tool. If i install via pip then it would use that crawler, not what I want – x89 Jul 13 '21 at 07:21

2 Answers2

0

If you have Firefox installed in a non-default location which is not in your system’s search path, you can specify a binary field on the moz:firefoxOptions capabilities object (documented in README), or use the --binary PATH flag passed to geckodriver when it starts.

0

Since Selenium is tagged, You can do the following changes to get rid of the above error :-

This is purely selenium solution, if you have a running instance of driver, you re-configure it using the FirefoxOptions like below :

options = webdriver.FirefoxOptions()
options.binary_location = r"C:\Program Files\Mozilla Firefox\firefox.exe"
driver = webdriver.Firefox(executable_path=r'\geckodriver.exe full path here', firefox_options=options)
driver.get("https://www.montratec.com1")

for crawler (Web scraping framework based on py3 asyncio & aiohttp libraries.)

Installation :

pip install crawler

Sample code :

import re
from itertools import islice

from crawler import Crawler, Request

RE_TITLE = re.compile(r'<title>([^<]+)</title>', re.S | re.I)

class TestCrawler(Crawler):
    def task_generator(self):
        for host in islice(open('var/domains.txt'), 100):
            host = host.strip()
            if host:
                yield Request('http://%s/' % host, tag='page')

    def handler_page(self, req, res):
        print('Result of request to {}'.format(req.url))
        try:
            title = RE_TITLE.search(res.body).group(1)
        except AttributeError:
            title = 'N/A'
        print('Title: {}'.format(title))

bot = TestCrawler(concurrency=10)
bot.run()

Official reference here

cruisepandey
  • 28,520
  • 6
  • 20
  • 38