A program that gather search results from Google and Yahoo

Question

I want to search on Google Yahoo, for forums and blog posts limited to a specific country. The results will be saved to a database for sorting and further processing.

From each search result, I need :

the URL itself
date and time
the domain

I am working on a program, that accepts keywords as input, and the program will automatically search on Google and Yahoo and save the results to a database.

function OnLoad() {
  // Create a search control
  var searchControl = new google.search.SearchControl();

  // Add in a full set of searchers
  var localSearch = new google.search.LocalSearch();
  searchControl.addSearcher(localSearch);
  searchControl.addSearcher(new google.search.WebSearch());
  searchControl.addSearcher(new google.search.VideoSearch());
  searchControl.addSearcher(new google.search.BlogSearch());
  searchControl.addSearcher(new google.search.NewsSearch());
  searchControl.addSearcher(new google.search.ImageSearch());
  searchControl.addSearcher(new google.search.BookSearch());
  searchControl.addSearcher(new google.search.PatentSearch());

  // Set the Local Search center point
  localSearch.setCenterPoint("New York, NY");

  // tell the searcher to draw itself and tell it where to attach
  searchControl.draw(document.getElementById("searchcontrol"));

  // execute an inital search
  searchControl.execute("VW GTI");
}
google.setOnLoadCallback(OnLoad);

This code is from the Google AJAX search API, however there seems not to be a way to specify the domain, country, date and time as search criteria. Moreover, it returns the result in HTML, which is hard to slice up and save as search results entries to the DB.

EDITED to describe my specific problem.

to broad, no code, see the faq on what is and is not an appropriate question. — , Nov 02 '11 at 02:31

score 2 · Answer 1 · answered Nov 02 '11 at 02:33

2

Parsing the raw HTML should be your last resort here. If they change the markup, you have to redesign your parser. That is pretty much guaranteed to happen before the "3 years" time period that you have mentioned with Google's AJAX Search API.

answered Nov 02 '11 at 02:33

Jon Newmuis

25,722
2
45
57

I agree that parsing HTML is a very bad solution. However there seems to be no way to store the results programmatically except to rely on third parties libraries, which can be unreliable. – Gapton Nov 02 '11 at 03:50
(a) Third party libraries will be more reliable than HTML scraping. (b) The way you've posed the question, I'm not sure how you would *not* rely on third party sources, given that you want to pull from Google and/or Yahoo. – Jon Newmuis Nov 02 '11 at 03:51

A program that gather search results from Google and Yahoo

1 Answers1