2

I'm trying to do something very simple, get the top result URL of a Google search, f.e. for the search LMFAO Yes at Pandora (at is intentional here) I want to receive the link www.pandora.com/music/song/lmfao/yes. All 3 of the major search engines return this result without any troubles.

The standard response on stackoverflow seems to be to use the Google AJAX API but ... to be blunt, it's total crap. It returns completely weird results i.e.

http://lyrics.wikia.com/LMFAO:Yes
http://www.pandora.com/music/artist/lmfao
http://www.pandora.com/music/song/goo%2Bgoo%2Bdolls/black%2Bballoon
http://new.music.yahoo.com/lmfao/

Unfortunately no amount of tinkering with the query has brought me any success and it gets even worse for more obscure searches, like Andrea Bauer at ilike

Bing and Yahoo also provide API's but they require one to get a Dev ID, i.e. no program distribution..

I've also found this suggestion that parses the standard Google page but it doesn't seem to work anymore.

The source code of the Google return page in my web browser looks un-searcheable to me..

Is there anything I can do to get the 'real' Google results?

Community
  • 1
  • 1
Kami
  • 241
  • 4
  • 12

3 Answers3

4

This library might be of interest to you: http://www.catonmat.net/blog/python-library-for-google-search/

>>> from xgoogle.search import GoogleSearch, SearchError
>>> try:
...   gs = GoogleSearch("LMFAO Yes at Pandora")
...   gs.results_per_page = 50
...   results = gs.get_results()
...   for res in results:
...     print res.title.encode("utf8")
...     print res.desc.encode("utf8")
...     print res.url.encode("utf8")
...     print
... except SearchError, e:
...   print "Search failed: %s" % e
... 
Yes - LMFAO - Pandora Internet Radio
Information about Yes - LMFAO at Pandora.com. Pandora is the Internet radio service that helps you find new music based on your old and current favorites.
http://www.pandora.com/music/song/lmfao/yes

LMFAO - Pandora Internet Radio
Listen and find out more about LMFAO at Pandora.com. Pandora is the Internet ...
http://www.pandora.com/music/artist/lmfao

search engine - Getting the 'real' Google results with Python ...
I'm trying to do something very simple, get the top result URL of a Google search, f.e. for the search LMFAO Yes at Pandora ( at is intentional here) I want ...
http://stackoverflow.com/questions/5361735/getting-the-real-google-results-with-python

LMFAO:Yes Lyrics - LyricWiki - Music lyrics from songs and albums
This song is performed by LMFAO and appears on the album Party Rock (2009).LMFAO:Yes ... Pandora: search for… LMFAO • Yes. Wikipedia: search for… ...
http://lyrics.wikia.com/LMFAO:Yes

YouTube - LMFAO - YES
Oct 6, 2007 ... JUST ANOTHER DAY IN THE CAR 4 LMFAO... OFFICIAL LMFAO Myspace  - http://lmfaomusic.com http://partyrocklife.com/ for OFFICIAL Party Rock gear ...
http://www.youtube.com/watch?v=nXPT8sw_FjU

Yes Lyrics - Lmfao
The group LMFAO goes double platnium hayyy. I got a party man that's how I live. So I take my elevator to the club in my crib like [Chorus:] Yes it's on and ...
http://www.6lyrics.com/yes10-lyrics-lmfao.aspx

LMFAO - YES LYRICS
1 post - Last post: Jun 3, 2010Lmfao Yes lyrics in the Party Rock Album. These Yes lyrics are performed by Lmfao Get the music video and song lyrics here.
http://www.metrolyrics.com/yes-lyrics-lmfao.html

Easy ChickHEN CFW installer (Without Pandora)
2 posts - 1 author - Last post: Jul 9, 2009Nice post! Smile I have the... Motherboard: TA-079v1, Model: Phat 100x, Hackable: yes, and Creates Pandora: yes. LMFAO! lol! Razz Very Happy ...
http://pspcoding.darkbb.com/t379-easy-chickhen-cfw-installer-without-pandora

Links on "Party Rock" | Facebook
EXCLUSIVE: Jamie Foxx Interview at LMFAO's "YES" Video Shoot - BVTV "Band of the ... Pandora Hau fuck yes party people! lmfao madness ...
http://www.facebook.com/posted.php?id=139241478710&share_id=145095215520613&comments=1

Pandora stays hot with investors - YEA!!! LMFAO!!! - ba.broadcast ...
Aug 31, 2010 ... Subject: Re: Pandora stays hot with investors - YEA!!! LMFAO! ... said that yes, Pandora was running an ad about every half hour. ...
http://groups.google.com/group/ba.broadcast/browse_thread/thread/947fb52042cd8211/41f25665629133cc?show_docid=41f25665629133cc

>>>
dting
  • 38,604
  • 10
  • 95
  • 114
0

As I already did that on Groovy and Java, the point here is asking for google through a URL, then construct a DOM document representing the HTML (Java: TagSoup, Python: ??? equivalent). After that, use XPath to gather links from your DOM document so you can "rank" from the Google result page the site you want to rank.

Welcome to black hat SEO?

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Alexis
  • 404
  • 1
  • 3
  • 10
  • @peter No i was just surprised that someone would put in the effort to make cosmetic changes to a (rather useless) post in a - pretty much - dead thread. – Kami Mar 26 '11 at 00:39
0

Xgoogle has unfortunately stopped working for me, but I've found out that Bing doesn't mind scraping their results as long as you use their RSS feed found here.

So now I'm just regexing the results page obtained through urlopen.

Kami
  • 241
  • 4
  • 12