-2

I want to search on Google Yahoo, for forums and blog posts limited to a specific country. The results will be saved to a database for sorting and further processing.

From each search result, I need :

  • the URL itself
  • date and time
  • the domain

I am working on a program, that accepts keywords as input, and the program will automatically search on Google and Yahoo and save the results to a database.

function OnLoad() {
  // Create a search control
  var searchControl = new google.search.SearchControl();

  // Add in a full set of searchers
  var localSearch = new google.search.LocalSearch();
  searchControl.addSearcher(localSearch);
  searchControl.addSearcher(new google.search.WebSearch());
  searchControl.addSearcher(new google.search.VideoSearch());
  searchControl.addSearcher(new google.search.BlogSearch());
  searchControl.addSearcher(new google.search.NewsSearch());
  searchControl.addSearcher(new google.search.ImageSearch());
  searchControl.addSearcher(new google.search.BookSearch());
  searchControl.addSearcher(new google.search.PatentSearch());

  // Set the Local Search center point
  localSearch.setCenterPoint("New York, NY");

  // tell the searcher to draw itself and tell it where to attach
  searchControl.draw(document.getElementById("searchcontrol"));

  // execute an inital search
  searchControl.execute("VW GTI");
}
google.setOnLoadCallback(OnLoad);

This code is from the Google AJAX search API, however there seems not to be a way to specify the domain, country, date and time as search criteria. Moreover, it returns the result in HTML, which is hard to slice up and save as search results entries to the DB.

EDITED to describe my specific problem.

tshepang
  • 12,111
  • 21
  • 91
  • 136
Gapton
  • 2,044
  • 2
  • 20
  • 33

1 Answers1

2

Parsing the raw HTML should be your last resort here. If they change the markup, you have to redesign your parser. That is pretty much guaranteed to happen before the "3 years" time period that you have mentioned with Google's AJAX Search API.

Jon Newmuis
  • 25,722
  • 2
  • 45
  • 57
  • I agree that parsing HTML is a very bad solution. However there seems to be no way to store the results programmatically except to rely on third parties libraries, which can be unreliable. – Gapton Nov 02 '11 at 03:50
  • (a) Third party libraries will be more reliable than HTML scraping. (b) The way you've posed the question, I'm not sure how you would *not* rely on third party sources, given that you want to pull from Google and/or Yahoo. – Jon Newmuis Nov 02 '11 at 03:51