I am following this tutorial on web scraping https://www.linkedin.com/pulse/how-easy-scraping-data-from-linkedin-profiles-david-craven/. The python script is generating errors and I've already tried adding the directory to the PATH and it shows when I echo the path to the screen, but now it shows "/Users/owner/Users/owner" when there should just be one "Users/owner" in the path.
I'm using bash inside mac os High Sierra and am a data science major so DevOps is a challenge for me as well as learning how to post code to StackOverflow but I'm trying to document my steps so it will be easier to troubleshoot this.
- I pip installed selenium
- I downloaded chromedriver to the directory for my webscraping script file and double clicked it to run
- I thought I added the directory to my PATH with 'export PATH=$PATH:~opt/bin:~/Users/owner/sbox/test/pandas_sqlite_dbase/chromedriver' which are the directions I found from http://osxdaily.com/2014/08/14/add-new-path-to-path-command-line/
- I updated PIP
- The directory I want to run the script from is '/Users/owner/sbox/test/pandas_sqlite_dbase'
- There was another SO post Can a website detect when you are using selenium with chromedriver? that talked about how chromedriver with selenium was now auto detected and disabled... so am I trying to scrape with an outdated code base?
- I can post my whole PATH or give other info.
from selenium import webdriver
driver = webdriver.Chrome('~/Users/owner/sbox/test/pandas_sqlite_dbase/googlechrome')
driver.get('https://www.linkedin.com')
Now I am getting a traceback error
Traceback (most recent call last):
File "/Users/owner/anaconda3/lib/python3.7/site-packages/selenium/webdriver/common/service.py", line 76, in start
stdin=PIPE)
File "/Users/owner/anaconda3/lib/python3.7/subprocess.py", line 775, in __init__
restore_signals, start_new_session)
File "/Users/owner/anaconda3/lib/python3.7/subprocess.py", line 1522, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: '~/Users/owner/sbox/test/pandas_sqlite_dbase/googlechrome': '~/Users/owner/sbox/test/pandas_sqlite_dbase/googlechrome'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/owner/sbox/test/pandas_sqlite_dbase/scraping_tutorial.py", line 7, in <module>
driver = webdriver.Chrome('~/Users/owner/sbox/test/pandas_sqlite_dbase/googlechrome')
File "/Users/owner/anaconda3/lib/python3.7/site-packages/selenium/webdriver/chrome/webdriver.py", line 73, in __init__
self.service.start()
File "/Users/owner/anaconda3/lib/python3.7/site-packages/selenium/webdriver/common/service.py", line 83, in start
os.path.basename(self.path), self.start_error_message)
selenium.common.exceptions.WebDriverException: Message: 'googlechrome' executable needs to be in PATH. Please see https://sites.google.com/a/chromium.org/chromedriver/home
[Finished in 0.7s with exit code 1]
[shell_cmd: python -u "/Users/owner/sbox/test/pandas_sqlite_dbase/scraping_tutorial.py"]
[dir: /Users/owner/sbox/test/pandas_sqlite_dbase]
[path: /usr/bin:/bin:/usr/sbin:/sbin]