I need to create a search engine that crawls thru a list of websites and searches there for a query, and those website all return some data in various formats and structures, I need to collect specific info (in a unique structure) from all these websites.
Is there a way I can do that with an existing engine such as Google Custom Search Engine? Or am I better creating one of my own? If yes, what's the first step I should take towards learning about indexing and searching these website efficiently and without filling up my servers with unuseful trash.
So to sum up, besides searching a query on each of these websites' search box, I need to handle the results of each of them appropriately and lay it over in a union structure in one place altogether. All the results are to be parsed and extracted into 4-6 fields (unless, of course, there is a way to this with Google CSE.