0

I am following this tutorial on web scraping https://www.linkedin.com/pulse/how-easy-scraping-data-from-linkedin-profiles-david-craven/. The python script is generating errors and I've already tried adding the directory to the PATH and it shows when I echo the path to the screen, but now it shows "/Users/owner/Users/owner" when there should just be one "Users/owner" in the path.

I'm using bash inside mac os High Sierra and am a data science major so DevOps​ is a challenge for me as well as learning how to post code to StackOverflow but I'm trying to document my steps so it will be easier to troubleshoot this.

  1. I pip installed selenium
  2. I downloaded chromedriver to the directory for my webscraping script file and double clicked it to run
  3. I thought I added the directory to my PATH with 'export PATH=$PATH:~opt/bin:~/Users/owner/sbox/test/pandas_sqlite_dbase/chromedriver' which are the directions I found from http://osxdaily.com/2014/08/14/add-new-path-to-path-command-line/
  4. I updated PIP
  5. The directory I want to run the script from is '/Users/owner/sbox/test/pandas_sqlite_dbase'
  6. There was another SO post Can a website detect when you are using selenium with chromedriver? that talked about how chromedriver with selenium was now auto detected and disabled... so am I trying to scrape with an outdated code base?
  7. I can post my whole PATH or give other info.
    from selenium import webdriver

    driver = webdriver.Chrome('~/Users/owner/sbox/test/pandas_sqlite_dbase/googlechrome')


    driver.get('https://www.linkedin.com')

Now I am getting a traceback error

Traceback (most recent call last):
  File "/Users/owner/anaconda3/lib/python3.7/site-packages/selenium/webdriver/common/service.py", line 76, in start
    stdin=PIPE)
  File "/Users/owner/anaconda3/lib/python3.7/subprocess.py", line 775, in __init__
    restore_signals, start_new_session)
  File "/Users/owner/anaconda3/lib/python3.7/subprocess.py", line 1522, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: '~/Users/owner/sbox/test/pandas_sqlite_dbase/googlechrome': '~/Users/owner/sbox/test/pandas_sqlite_dbase/googlechrome'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/owner/sbox/test/pandas_sqlite_dbase/scraping_tutorial.py", line 7, in <module>
    driver = webdriver.Chrome('~/Users/owner/sbox/test/pandas_sqlite_dbase/googlechrome')
  File "/Users/owner/anaconda3/lib/python3.7/site-packages/selenium/webdriver/chrome/webdriver.py", line 73, in __init__
    self.service.start()
  File "/Users/owner/anaconda3/lib/python3.7/site-packages/selenium/webdriver/common/service.py", line 83, in start
    os.path.basename(self.path), self.start_error_message)
selenium.common.exceptions.WebDriverException: Message: 'googlechrome' executable needs to be in PATH. Please see https://sites.google.com/a/chromium.org/chromedriver/home

[Finished in 0.7s with exit code 1]
[shell_cmd: python -u "/Users/owner/sbox/test/pandas_sqlite_dbase/scraping_tutorial.py"]
[dir: /Users/owner/sbox/test/pandas_sqlite_dbase]
[path: /usr/bin:/bin:/usr/sbin:/sbin]
Vadim Kotov
  • 8,084
  • 8
  • 48
  • 62

1 Answers1

1

I would check what ~ actually is (seems you have the concept bad) usually is home dir, so, for a user, your "Users/owner", that's why you are obtaining "Users/owner/Users/owner".

To check this, you can

$>cd ~
$>pwd
Greco
  • 172
  • 1
  • 11
  • Greco, I recall that '~' or tilde is a reference to a relative path. Even if I remove the tilde from the path or remove the /Users/owner from the script the error is still 'googlechrome' executable needs to be in PATH. Oh wait, after re-reading your reply. You are saying the doubled up "Users/owner" is due to my tilde in PATH? – Seattle Python Noobie Sep 10 '19 at 08:13
  • Greco, I removed the second "User/owner" and added that to my path, then also removed the (stuff in here). It works! – Seattle Python Noobie Sep 10 '19 at 08:33