urlib.request.urlopen not accepting query string with spaces

Question

I am taking a udacity course on python where we are supposed to check for profane words in a document. I am using the website http://www.wdylike.appspot.com/?q= (text_to_be_checked_for_profanity). The text to be checked can be passed as a query string in the above URL and the website would return a true or false after checking for profane words. Below is my code.

import urllib.request

# Read the content from a document
def read_content():

    quotes = open("movie_quotes.txt")
    content = quotes.read()
    quotes.close()
    check_profanity(content)



def check_profanity(text_to_read):
    connection = urllib.request.urlopen("http://www.wdylike.appspot.com/?q="+text_to_read)
    result = connection.read()
    print(result)
    connection.close

read_content()

It gives me the following error

Traceback (most recent call last):
   File "/Users/Vrushita/Desktop/Rishit/profanity_check.py", line 21, in <module>
     read_content()
   File "/Users/Vrushita/Desktop/Rishit/profanity_check.py", line 11, in read_content
     check_profanity(content)
   File "/Users/Vrushita/Desktop/Rishit/profanity_check.py", line 16, in check_profanity
     connection = urllib.request.urlopen("http://www.wdylike.appspot.com/?q="+text_to_read)
   File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/request.py", line 163, in urlopen
     return opener.open(url, data, timeout)
   File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/request.py", line 472, in open
     response = meth(req, response)
   File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/request.py", line 582, in http_response
     'http', request, response, code, msg, hdrs)
   File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/request.py", line 510, in error
     return self._call_chain(*args)
   File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/request.py", line 444, in _call_chain
     result = func(*args)
   File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/request.py", line 590, in http_error_default
     raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 400: Bad Request

The document that I am trying to read the content from contains a string "Hello world" However, if I change the string to "Hello+world", the same code works and returns the desired result. Can someone explain why this is happening and what is a workaround for this?

`urllib` accepts it, the *server* doesn't. And well it should not, because a space is not a valid URL character. — Martijn Pieters, Dec 18 '16 at 17:34
Possible duplicate of [How to formally insert URL space (%20) using Python?](http://stackoverflow.com/questions/32762219/how-to-formally-insert-url-space-20-using-python) — Valentin Lorentz, Dec 18 '16 at 17:35
As @MartijnPieters said, spaces are not allowed in URLs. You may think they do, because browsers silently encode them to `%20` or `+`; but you are outside a browser here, so you have to do it yourself. — Valentin Lorentz, Dec 18 '16 at 17:37
@DanielRoseman Bad idea, the error will show up with other characters. The right way is to use `urllib.quote` or `urllib.quote_plus`. — Valentin Lorentz, Dec 18 '16 at 17:38
@ValentinLorentz that was exactly the confusion I had. Thank you for clarification. — Rishit Shah, Dec 18 '16 at 17:51

Martijn Pieters · Accepted Answer · 2020-04-15T11:06:20.030

6

urllib accepts it, the server doesn't. And well it should not, because a space is not a valid URL character.

Escape your query string properly with urllib.parse.quote_plus(); it'll ensure your string is valid for use in query parameters. Or better still, use the urllib.parse.urlencode() function to encode all key-value pairs:

from urllib.parse import urlencode

params = urlencode({'q': text_to_read})
connection = urllib.request.urlopen(f"http://www.wdylike.appspot.com/?{params}")

edited Apr 15 '20 at 11:06

answered Dec 18 '16 at 17:37

Martijn Pieters

1,048,767
296
4,058
3,343

Note that this is only for Python 2. See below answers for Python 3 code. – luca76 Apr 15 '20 at 08:41
1

@luca76: no, this answer does **not** work on Python 2. It's explicitly for Python 3. – Martijn Pieters Apr 15 '20 at 11:00
1

@luca76: for starters, in Python 2 you'd have to use `from urllib import urlencode`. I did have a typo in the import line however, now corrected. – Martijn Pieters Apr 15 '20 at 11:02

score 5 · Answer 2 · answered Apr 02 '17 at 11:41

The below response is for python 3.* 400 Bad request occurs when there is space within your input text. To avoid this use parse. so import it.

from urllib import request, parse

If you are sending any text along with the url then parse the text.

url = "http://www.wdylike.appspot.com/?q="
url = url + parse.quote(input_to_check)

Check the explanation here - https://discussions.udacity.com/t/problem-in-profanity-with-python-3-solved/227328

The Udacity profanity checker program -

from urllib import request, parse

def read_file():
    fhand = open(r"E:\Python_Programming\Udacity\movie_quotes.txt")
    file_content = fhand.read()
    #print (file_content)
    fhand.close()
    profanity_check(file_content)

def profanity_check(input_to_check):
    url = "http://www.wdylike.appspot.com/?q="
    url = url + parse.quote(input_to_check)
    req = request.urlopen(url)
    answer = req.read()
    #print(answer)
    req.close()

    if b"true" in answer:
        print ("Profanity Alret!!!")
    else:
        print ("Nothing to worry")


read_file()

score 1 · Answer 3 · answered Apr 09 '18 at 02:54

I think this code is closer to what the Lesson was aiming to, inferencing the difference between native functions, classes and functions inside classes:

from urllib import request, parse

def read_text():
    quotes = open('C:/Users/Alejandro/Desktop/movie_quotes.txt', 'r+')
    contents_of_file = quotes.read()
    print(contents_of_file)
    check_profanity(contents_of_file)
    quotes.close()

def check_profanity(text_to_check):
    connection = request.urlopen('http://www.wdylike.appspot.com/?q=' + parse.quote(text_to_check))
    output = connection.read()
    # print(output)
    connection.close()

    if b"true" in output:
        print("Profanity Alert!!!")
    elif b"false" in output:
        print("This document has no curse words!")
    else:
        print("Could not scan the document properly")

read_text()

Joey · Answer 4 · 2019-03-14T22:37:34.293

I'm working on the same project also using Python 3 like the most.

While looking for the solution in Python 3, I found this HowTo, and I decided to give it a try.

It seems that on some websites, including Google, connections through programming code (for example, via the urllib module), sometimes does not work properly. Apparently this has to do with the User Agent, which is recieved by the website when building the connection.

I did some further researches and came up with the following solution:

First I imported URLopener from urllib.request and created a class called ForceOpen as a subclass of URLopener.

Now I could create a "regular" User Agent by setting the variable version inside the ForceOpen class. Then just created an instance of it and used the open method in place of urlopen to open the URL.

(It works fine, but I'd still appreciate comments, suggestions or any feedback, also because I'm not absolute sure, if this way is a good alternative - many thanks)

from urllib.request import URLopener


class ForceOpen(URLopener):  # create a subclass of URLopener
    version = "Mozilla/5.0 (cmp; Konqueror ...)(Kubuntu)"

force_open = ForceOpen()  # create an instance of it


def read_text():
    quotes = open(
        "/.../profanity_editor/data/quotes.txt"
    )
    contents_of_file = quotes.read()
    print(contents_of_file)
    quotes.close()
    check_profanity(contents_of_file)


def check_profanity(text_to_check):
    # now use the open method to open the URL
    connection = force_open.open(
        "http://www.wdylike.appspot.com/?q=" + text_to_check
    )
    output = connection.read()
    connection.close()

    if b"true" in output:
        print("Attention! Curse word(s) have been detected.")

    elif b"false" in output:
        print("No curse word(s) found.")

    else:
        print("Error! Unable to scan document.")


read_text()

urlib.request.urlopen not accepting query string with spaces

4 Answers4