0

I have the following program, in which I am trying to pass a list of elements to consecutive Google searches:

search_terms = ['Telejob (ETH)', 'Luisa da Silva','The CERN Recruitment Services']
for el in search_terms:
    webpage = 'http://google.com/search?q='+el)
    print('xxxxxxxxxxxxxxxxxxx')
    print(webpage)

Unfortunately my program is not taking ALL the words in each list item, but taking only the first one, giving me this output:

http://google.com/search?q=Telejob (ETH)
xxxxxxxxxxxxxxxxxxx
http://google.com/search?q=Luisa da Silva
xxxxxxxxxxxxxxxxxxx
http://google.com/search?q=The CERN Recruitment Services
xxxxxxxxxxxxxxxxxxx
http://google.com/search?q=The Swiss National Science Foundation

Altough you can see the whole item with every word being added to the search above, when I verify the link, it is going concatenating as element ONLY the first word of each item, as such:

http://google.com/search?q=Telejob
xxxxxxxxxxxxxxxxxxx
http://google.com/search?q=Luisa
xxxxxxxxxxxxxxxxxxx
http://google.com/search?q=The
xxxxxxxxxxxxxxxxxxx
http://google.com/search?q=The

What am I doing wrong and what's the solution to concatenate ALL the words in each list item to the google search?

Thank you

skeitel
  • 271
  • 2
  • 6
  • 17
  • See [here](http://stackoverflow.com/questions/19353368/passing-string-variable-with-spaces). Different language, same problem, same solution. – Arya McCarthy Apr 07 '17 at 17:39

6 Answers6

0

I believe your problem is with url-encoding.

To allow spaces in the URLs they are place by '%20'

Try changing your links to be like

https://www.google.com/search?q=The%20CERN%20Recruitment%20Services

Brian H
  • 1,033
  • 2
  • 9
  • 28
0

This line:

webpage = 'http://google.com/search?q='+el)

should be split and joined with a %20 joiner:

webpage = 'http://google.com/search?q='+'%20'.join(el.split()))
JacobIRR
  • 8,545
  • 8
  • 39
  • 68
  • I am new. What's the disadvantage of using this method over urllib from Evans Murithi above? – skeitel Apr 07 '17 at 18:39
  • My solution specifically deals with spaces. It has less coverage than the URL lib solution, but doesn't require an import... I'm just answering the question directly without much more context... either answer could be right depending on the needs of the asker. – JacobIRR Apr 07 '17 at 18:41
0

You can use urllib.parse.urlencode in python3. For python2 you can use urllib.urlencode.

import urllib

search_terms = ['Telejob (ETH)', 'Luisa da Silva','The CERN Recruitment Services']
for el in search_terms:
    query = urllib.parse.urlencode({'q': el})  # urllib.urlencode({'q': el})
    webpage = 'http://google.com/search?{}'.format(query)
    print('xxxxxxxxxxxxxxxxxxx')
    print(webpage)
Evans Murithi
  • 3,197
  • 1
  • 21
  • 26
  • I am new. What's the advantage of using this method over the joint method described by JacobIRR below? – skeitel Apr 07 '17 at 18:40
  • Lets say you have special characters `ñ´ç` in your string, the use of concatenation `+` will not encode it. `urllib` will encode it to `q=%C3%B1%C2%B4%C3%A7` – Evans Murithi Apr 07 '17 at 18:44
0

Neither of these answers address the base issue: you need to encode the entire string as a url.

I chose urllib.quote():

>>> import urllib
>>> for term in search_terms:
    print urllib.quote(term)
Telejob%20%28ETH%29
Luisa%20da%20Silva
The%20CERN%20Recruitment%20Services

Notice the () are also encoded, as will any other strange characters that might bork your query.

In your case, it would be:

webpage = 'http://google.com/search?q=' + urllib.quote(el))

the equivalent in Py3:

from urllib import parse
for term in search_terms:
    print(parse.quote(term))

so

webpage = 'http://google.com/search?q=' + parse.quote(el))
TemporalWolf
  • 7,727
  • 1
  • 30
  • 50
  • Traceback (most recent call last): File "C:/Users/SK/PycharmProjects/untitled/another_temperase.py", line 13, in print(urllib.quote(el)) AttributeError: module 'urllib' has no attribute 'quote' – skeitel Apr 07 '17 at 18:04
  • @skeitel for Py3, you'll need to `from urllib import parse` and use `parse.quote()` instead or use [urllib.parse.urlencode()](https://docs.python.org/3/library/urllib.parse.html#urllib.parse.urlencode), as Evans Murithi explained – TemporalWolf Apr 07 '17 at 18:07
  • @skeitel The thing is that in Python 3 it's not `urllib.quote()`, but `url.parse.urlencode()` – Juan T Apr 07 '17 at 18:09
  • Although `urllib.parse.quote()` still will work in Py3... I'll update – TemporalWolf Apr 07 '17 at 18:09
  • @JuanT `urlencode()` takes a slightly different input: key, value pairs instead of a pure string. It's probably a better options, but it does require further modification. – TemporalWolf Apr 07 '17 at 18:13
  • 1
    My bad, I meant `urllib.parse.quote()` and I got confused – Juan T Apr 07 '17 at 18:16
0

The thing is that URLs need to be percent-encoded, there are characters with special meaning in URLs, for example:

  • #: goes to a certain position in the page
  • /: I think you know what this one does...

You should use quote() to fix that, and just remember that:

  • urllib.quote() is for Python2
  • url.parse.quote() is for Python3

Here are some examples for Python3:

from urllib.parse import quote


quote('/bars/will/stay/intact')
#'/bars/will/stay/intact'

quote('/bars/wont/stay/intact', safe='')
#'%2Fbars%2Fwont%2Fstay%2Fintact' #Actually, everything will be encoded here

quote('()ñ´ ç')
#'%28%29%C3%B1%C2%B4%20%C3%A7'

So you code is now:

search_terms = ['Telejob (ETH)', 'Luisa da Silva','The CERN Recruitment Services']
for el in search_terms:
    webpage = 'http://google.com/search?q='+quote(el)
    print('xxxxxxxxxxxxxxxxxxx')
    print(webpage)

As search_terms could include other characters that won't be escaped by quote('something'), you'll have to use its safe argument:

search_terms = ['Telejob (ETH)', 'Luisa da Silva','The CERN Recruitment Services']
for el in search_terms:
    webpage = 'http://google.com/search?q='+quote(el, safe='')
    print('xxxxxxxxxxxxxxxxxxx')
    print(webpage)

This last one, outputs:

xxxxxxxxxxxxxxxxxxx
http://google.com/search?q=Telejob%20%28ETH%29
xxxxxxxxxxxxxxxxxxx
http://google.com/search?q=Luisa%20da%20Silva
xxxxxxxxxxxxxxxxxxx
http://google.com/search?q=The%20CERN%20Recruitment%20Services

I would suggest you to see: https://docs.python.org/3/library/urllib.parse.html#url-quoting for further information (See? a # character!)

Juan T
  • 1,219
  • 1
  • 10
  • 21
0

Google queries have the format https://www.google.com/search?q=keyword_1+...+keyword_N so you should format your query like so:

search_terms = ["Telejob (ETH)", "Luisa da Silva","The CERN Recruitment Services"]
for search_term in search_terms:
    query = "+".join(search_term.split())
    url = "http://google.com/search?q=" + query
Robert Valencia
  • 1,752
  • 4
  • 20
  • 36