1

I need to create a search engine that crawls thru a list of websites and searches there for a query, and those website all return some data in various formats and structures, I need to collect specific info (in a unique structure) from all these websites.

Is there a way I can do that with an existing engine such as Google Custom Search Engine? Or am I better creating one of my own? If yes, what's the first step I should take towards learning about indexing and searching these website efficiently and without filling up my servers with unuseful trash.

So to sum up, besides searching a query on each of these websites' search box, I need to handle the results of each of them appropriately and lay it over in a union structure in one place altogether. All the results are to be parsed and extracted into 4-6 fields (unless, of course, there is a way to this with Google CSE.

Shimmy Weitzhandler
  • 101,809
  • 122
  • 424
  • 632

2 Answers2

1

Google CSE provides some interfaces to the standard Google web search. You can control the user interface and the search parameters, but you have no control over the indexing, nor any direct access to the index data.

You might be more interested in the Google Search API's that are available with GAE. These are quite different: they are search services in which you provide the data and control the indexes.

Tom
  • 17,103
  • 8
  • 67
  • 75
  • thanks for your reply. I'm currently using [Google Custom Search API for .NET](https://developers.google.com/api-client-library/dotnet/apis/customsearch/v1). I need to perform searches on about 5 websites, these websites indeed contain millions of records, but it's not that I need the entire web covered, is it an option to make my own search engine? – Shimmy Weitzhandler Apr 07 '14 at 23:41
  • Certainly - that is how they expect you to use CSE: create a CSE in their control panel, and list the 5 websites. And there is an option to say whether you only want those 5, or the whole web but emphasizing those 5. I think CSE is very good but be warned that it is quite expensive if your volume of queries will be high. – Tom Apr 08 '14 at 00:48
  • Do you think ads will pay off the expenses? I mean many users = many requests = many ad clicks, ain't it? Besides you didn't answer my last question if building my own index of these websites looks to you like a relevant option (it's between 5 to 20 website, but have millions of records each) . – Shimmy Weitzhandler Apr 08 '14 at 07:28
  • You can't build your own indexes with CSE, and you can't use the other Search API's because you are using dotNet. And no, the ads won't pay for it - Look here: http://stackoverflow.com/a/22494400/150016 – Tom Apr 08 '14 at 15:46
1

here in dec 2018, with google CSE, we can define a set of websites from where we can do our request. google CSE offers up to 2000 website sources to include and up to 5000 sources Overall.

a simple comparison:

  • Google CSE provides a strong API , custom requests, and nothing to run in your server but in contrast it permits only 100 requests by day for free use.

  • developing a new SE could be helpful for small sets of websites and it provides a customized SE for the business needs but it requires : time, infrastructure, money investement ,developement of SE algorithms: indexing, storage and analyis.

To sum up. It depends on what side you really need it.

B.Fodil
  • 11
  • 1