0

So I have a web scraper using selenium, code below. The problem is that each time the code is run, a new process seems to permanently appear as per htop- both a chromedriver process, and a XVFB process, as can be seen here https://i.stack.imgur.com/8wWql.png. I ran the function five times, and there are five XVFB's open (and 7 chromedrivers for some reason). I have a display.stop() and driver.close() which should prevent this happening? My code frequently throws an error but I have the stop/close instructions in the except as well so this should not affect the closing. I'm only running python 2.7 if that's relevant. The web scraper works fine other than this RAM clogging issue.

What is going on?

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import Select
from bs4 import BeautifulSoup
import time
import json
import traceback
from pyvirtualdisplay import Display

def scrapeBank(bank, return_dict):

    try:
        display = Display(visible=0, size=(800, 600))
        display.start()
        options = webdriver.ChromeOptions()
        options.add_argument('--no-sandbox')
        options.add_argument('--disable-extensions')
        options.add_argument('--headless')
        options.add_argument('--disable-gpu')
        driver = webdriver.Chrome(chrome_options=options)

        [do a bunch of stuff]

        print('Bank Scrape completed')  
        display.stop()
        driver.close()
        return_dict['transactions'] = transactions

    except:
        display.stop()
        driver.close()
        print(traceback.format_exc())
mcplums
  • 159
  • 2
  • 2
  • 9
  • I personally think that the problem is with that closing, it looks like exception doesnt work properly ( as you are describing it ) ... when I was playing around with it, I was stopping the driver in processes, so I recommend to do that as a test and see how it goes :) – StyleZ Mar 09 '19 at 14:47
  • Well I have the same problem without an error being thrown so I don't think it's specific to the except statement. What exactly do you mean by 'I was stopping the driver in processes'? Just manually ending the process? Im running ubuntu command line only... – mcplums Mar 09 '19 at 14:52
  • Ibwas doing it on a mac, i do not have my laptop near to me, but it was basically a terminating the driver ... i think it was a kill command in a console – StyleZ Mar 09 '19 at 14:53
  • Sure, well all the processes quit if I manually exit the python script. So I can just do that to clear too many processes. I'm not sure what else I can do to test this? I have the correct code, it doesn't work... I'm out of ideas.... – mcplums Mar 09 '19 at 14:54
  • you need to close the driver before you stop the display it is running in. – Corey Goldberg Mar 09 '19 at 18:33

0 Answers0