10

It's relatively easy to parse the output of the AJAX API using a scripting language:

#!/usr/bin/env python

import urllib
import json

base = 'http://ajax.googleapis.com/ajax/services/search/web?v=1.0&'
query = urllib.urlencode({'q' : "something"})
response = urllib.urlopen(base + query).read()
data = json.loads(response)
print data['responseData']['results'][0]['url']

But are there any better ways to do something similar with just basic shell scripting? If you just curled the API page, how should you encode the URL parameters or parse JSON?

Lri
  • 26,768
  • 8
  • 84
  • 82
  • Is there a problem with your current escaping? I don't see any problem with using urlencode. – Xepo Apr 01 '11 at 02:38
  • @Xepo Just that it depends on Python. But based on [URLEncode from a bash script - Stack Overflow](http://stackoverflow.com/questions/296536/urlencode-from-a-bash-script) it does really seem like one of the most reasonable methods. – Lri Apr 01 '11 at 14:49
  • @Lri Is this a bash or a pyton question?, please CONSIDER removing the [bash] tag – nhed Apr 02 '11 at 02:35
  • @nhed I'm trying to replace the Python example with a bash script. – Lri Apr 02 '11 at 15:27
  • 1
    Maybe just use [`googlecl`](http://code.google.com/p/googlecl/)? – Daenyth May 04 '11 at 18:49
  • @Daenyth It doesn't support search (yet?). – Lri May 05 '11 at 17:50
  • @Lri: Hmm, so it doesn't. I need moar rtfm. – Daenyth May 05 '11 at 17:51

6 Answers6

6

I ended up using curl's --data-urlencode option to encode the query parameter and just sed for extracting the first result.

curl -s --get --data-urlencode "q=example" http://ajax.googleapis.com/ajax/services/search/web?v=1.0 | sed 's/"unescapedUrl":"\([^"]*\).*/\1/;s/.*GwebSearch",//'

Lri
  • 26,768
  • 8
  • 84
  • 82
  • This solution seems to only return the last of 4 results for me – katbyte Oct 12 '11 at 06:28
  • 5
    `The Google Web Search API is no longer available. Please migrate to the Google Custom Search API (https://developers.google.com/custom-search/)` – Tom Hale Sep 17 '17 at 15:07
5

many years later, you can install googler

googler -n 1 -c in -l en search something here --json

you can control the number of output page using the n flag.

To get only the url, simply pipe it to:

grep "\"url\""|tr -s ' ' |cut -d ' ' -f3|tr -d "\""
once
  • 1,369
  • 4
  • 22
  • 32
  • 1
    An easier way to extract just the URL is via [jq](https://stedolan.github.io/jq/): `jq -r '.[].url'`. – ib. Mar 29 '18 at 10:37
4

@Lri - Here is a script I personally use for my purpose of command line tools & scripts. It uses the command line utility "lynx" for dumping the URLs. Script can be downloaded from HERE and code view is HERE. Here is the code for your reference,

#!/bin/bash

clear
echo ""
echo ".=========================================================."
echo "|                                                         |"
echo "|  COMMAND LINE GOOGLE SEARCH                             |"
echo "|  ---------------------------------------------------    |"
echo "|                                                         |"
echo "|  Version: 1.0                                           |"
echo "|  Developed by: Rishi Narang                             |"
echo "|  Blog: www.wtfuzz.com                                   |"
echo "|                                                         |"
echo "|  Usage: ./gocmd.sh <search strings>                     |"
echo "|  Example: ./gocmd.sh example and test                   |"
echo "|                                                         |"
echo ".=========================================================."
echo ""

if [ -z $1 ]
then
 echo "ERROR: No search string supplied."
 echo "USAGE: ./gocmd.sh <search srting>"
 echo ""
 echo -n "Anyways for now, supply the search string here: "
 read SEARCH
else
 SEARCH=$@
fi

URL="http://google.com/search?hl=en&safe=off&q="
STRING=`echo $SEARCH | sed 's/ /%20/g'`
URI="$URL%22$STRING%22"

lynx -dump $URI > gone.tmp
sed 's/http/\^http/g' gone.tmp | tr -s "^" "\n" | grep http| sed 's/\ .*//g' > gtwo.tmp
rm gone.tmp
sed '/google.com/d' gtwo.tmp > urls
rm gtwo.tmp

echo "SUCCESS: Extracted `wc -l urls` and listed them in '`pwd`/urls' file for reference."
echo ""
cat urls
echo ""

#EOF
r-n
  • 41
  • 4
2

Untested approach as I don't have access to a unix box currently ...

Assuming "test" is the query string, you could use a simple wget on the following url http://www.google.co.in/#hl=en&source=hp&biw=1280&bih=705&q=test&btnI=Google+Search&aq=f&aqi=g10&aql=&oq=test&fp=3cc29334ffc8c2c

This would leverage Google's "I'm feeling lucky" functionality and wget the first url for you. You may be able to clean up the above url a bit too.

qwerty
  • 3,801
  • 2
  • 28
  • 43
  • Thanks, worked for me without `biw`, `bih`, and `fp` parameters. – meaning-matters Apr 06 '13 at 11:04
  • 1
    Yeah, only the btnI parameter appears to be relevant. This works for me as a single-line #!/bin/sh shell script, where I am restricting the search to developer.mozilla.org (probably not your use case, but whatever -- I have this saved as "jsdoc"): `exec firefox "https://www.google.com/search?btnI=Google+Search&q=site%3Adeveloper.mozilla.org+$1"` – sfink Jul 29 '17 at 18:52
1

Just for reference: By November 2013, you will need to replace the ajax.googleapis.com/ajax/services/search/web calls completely.

Most likely, it has to be replaced with Custom Search Engine (CSE). The problem is that you won't be able to get "global" results from CSE. Here is a nice tip on how to do this: http://groups.google.com/a/googleproductforums.com/d/msg/customsearch/0aoS-bXgnEM/lwlZ6_IyVDQJ.

ib.
  • 27,830
  • 11
  • 80
  • 100
neverlastn
  • 2,164
  • 16
  • 23
1

Lri's answer only returned the last result for me and i needed the top so I changed it to:

JSON=$(curl -s --get --data-urlencode "q=QUERY STRING HERE" http://ajax.googleapis.com/ajax/services/search/web?v=1.0 | python -mjson.tool)
response=$(echo "$JSON" | sed -n -e 's/^.*responseStatus\": //p')
if [ $response -eq 200 ] ; then 
    url=$(echo "$JSON" | egrep "unescapedUrl" | sed -e '1!d' -e "s/^.*unescapedUrl\": \"//" -e "s/\".*$//")
    echo "Success! [$url]"
    wget $url;
else 
    echo "FAILED! [$response]" 
fi

Its not as compact as I'd like but in a rush.

katbyte
  • 2,665
  • 2
  • 28
  • 19