-3

Working on a project to automate searching a few thousand google searches and be able to check if there are "no results found" or not and input into an array.

Using BeautifulSoup but I can't get started by importing the HTML from a URL:

from bs4 import BeautifulSoup
import requests

html = requests.get('www.lifehack.org')
soup = BeautifulSoup(html,'html.parser')

the packages install fine but I get the error:

MissingSchema                             Traceback (most recent call last)
<ipython-input-28-8e881302fa25> in <module>
      1 from bs4 import BeautifulSoup
      2 import requests
----> 3 html = requests.get('www.lifehack.org')
      4 soup = BeautifulSoup(html,'html.parser')

C:\Program Files (x86)\Anaconda\lib\site-packages\requests\api.py in get(url, params, **kwargs)
     73 
     74     kwargs.setdefault('allow_redirects', True)

+Many more lines of similar stuff

I'm not sure how to fix this. I want to be able to quickly get the HTML straight into the program and not have to copy it and save in a local HTML file

Any help would be much appreciated, Thanks.

Nazim Kerimbekov
  • 4,712
  • 8
  • 34
  • 58
  • 1
    You 'searching searches' by the thousands? Google searches? Have you considered using Google's API instead? I typically use BeautifulSoup to pull data for one off, specific things if there's no API. Your question is probably a duplicate of link related. Are you using Python 3? https://stackoverflow.com/questions/17309288/importerror-no-module-named-requests – Ant Dec 27 '19 at 00:29
  • As @Ant mentionned it , if you try to automate Google Search Results you will sooner or later get you IP blocked by Google as it infriges ToS. Get a look at GoogleSearch API. – c24b Dec 28 '19 at 17:53

1 Answers1

2

First off, you should post your full error message, there's no way to troubleshoot the issue if you've only posted part of your error message.

That said, one thing that's probably causing issues is that your url needs to be fully qualified.

html = requests.get('http://www.lifehack.org')

In fact, if you had posted the full error you get from executing your code, you would have seen something like this, which gives you your answer:

MissingSchema: Invalid URL 'www.lifehack.org': No schema supplied. Perhaps you meant http://www.lifehack.org?

Once you fix that, you run into another issue:

Traceback (most recent call last):

File "", line 4, in soup = BeautifulSoup(html,'html.parser')

File "C:\bs4__init__.py", line 267, in init elif len(markup) <= 256 and (

TypeError: object of type 'Response' has no len()

your html variable is a Response object, and you can't pass that directly to BeautifulSoup. You want to pass the text that you get from the response.

soup = BeautifulSoup(html.text,'html.parser')

Moral of the story: pay attention to your error messages, they are your guides.

Julian Drago
  • 719
  • 9
  • 23