I'm working on a website that has minimal traffic at the moment. It's built using Ruby on Rails and runs on Heroku's cloud platform.
As part of the site, I have a large number of pages that need to be searchable, each of which only has a tiny amount of information on it. Think of a table of articles where each article only needs its title indexed, but there are around 8 million articles.
Postgres Search: When I first started working on this, I ran Postgres full text search, but apparently it's not optimized enough for search to handle this many indexed items, and ran dog slow. I had some searches that were timing out the database connection and taking more than 30 seconds to complete.
Websolr: I then moved onto what was then the one and only Heroku add-on for cloud search, Websolr by OneMoreCloud. Unfortunately, they charge by the number of items indexed, which is horrible for a site like mine that has no traffic but a large number of items to index, and I had performance that was arguably worse than Postgres search, which was free. Where Postgres search would timeout, and bring down the site, Websolr would return an empty or partial results set, making viewers think that the result wasn't in the database.
Index Tank: Now Heroku has added another cloud search provider, Index Tank, which is in beta still. While the beta for it is free, I'm reluctant to try them because for their non-Heroku service, which is not free, their highest plan only has 2 million documents while already costing an eye popping $500 a month.
Google Site Search: An option I'm currently looking at is moving over to Google Site Search. The Google search brand gives me confidence that I won't run into the performance issues I had in the past. Also, their pricing is extremely reasonable, and is priced by traffic. However, on the downside, it's not truly an integrated search, as it doesn't hook into the database but only looks at webpages, so there's no way as far as I can tell to specify a search where it only returns, say, articles in the Technical Articles category or something like that. Even to customize the appearance of the search results seems like it's kind of a pain, in that I'd have to parse the search results in XML and then use that to generate my search result page, and if I wanted to customize with meta data in the display, I'd have to used the parsed search results to look up all the results' rows in my database.
Are there any good options for cloud or 3rd party search providers out there that you'd recommend to the Stackoverflow community?