0

I'm using selenium chrom webdriver to crawl webpages one by one, and each time initialize a driver instance with closing operation after crawling finished. After trying several times, there are almost 10000 chrome processes in system, and can't be killed by kill command. How to handle this problem? Thanks~

code as follows:

@classmethod
def get_content_by_selenium(cls, url):
        content = ''
        chrome_options = webdriver.ChromeOptions()
        chrome_options.add_argument('--headless')
        chrome_options.add_argument('--start-maximized')
        chrome_options.add_argument('--disable-extensions')
        chrome_options.add_argument('--disable-infobars')
        chrome_options.add_argument('--disable-gpu')
        chrome_options.add_argument('--no-sandbox')
        chrome_options.add_argument('--no-proxy-server')
        chrome_options.add_argument('--disable-dev-shm-usage')
        try:
            cls.driver = webdriver.Chrome(options=chrome_options, executable_path='/home/chromedriver')  # Optional argument, if not specified will search path.
            cls.driver.set_page_load_timeout(30)
            cls.driver.get(url)
            html = cls.driver.page_source
            soup = BeautifulSoup(html, 'html.parser')
            for script in soup(["script", "style"]):
                script.extract()
            meta = cls.get_meta(soup)
            text = ' '.join(soup.text.split())
            content = ' '.join([meta, text])
        except Exception as e:
            print(e)
            print('webdriver failed, continue running')
        finally:
            if not cls.driver is None:
                cls.driver.quit()
            return content
Alexander
  • 523
  • 5
  • 21
  • What is cls.get_meta? Does the process open another browser? As you did use .quit(), I will delete my original answer. – Yuan Aug 01 '19 at 03:42
  • First get_meta is another class method. The process I use will call get_content_by_selenium many times, so there are many chrome instances. As you see, I user quit() within finally, it will execute in any cases. – Alexander Aug 01 '19 at 08:47
  • I cannot reproduce your error, the code on my computer quit the browser properly. – Yuan Aug 01 '19 at 08:48
  • I use docker for that, is there some adverse effects? – Alexander Aug 01 '19 at 08:52
  • I do not think there would any adverse effects. I am running a similar project in my Linux server. Each time, when I run .quit(), all memory was released. I recommend you to add some print after `cls.driver.quit()` to assure the browser is really closed. There must be some error among your entire code. – Yuan Aug 01 '19 at 11:03

0 Answers0