0

Let's say if we have a document that can have 50-60 fields. For exmple:

{  
  "priceCurrency": "USD",
  "price": "119.99",
  "priceValidUntil": "2020-11-05",
  "brand": "Acme",
  "logo": "http://www.example.com/logo.png",
  "name": "WidgetPress",
  "category": "Widgets",
  "image": "http://www.example.com/image.jpg",
  "description": "This is an excellent widget with 21 features and 4 colors."
}

Now out of those 60 fields, we can query on 20-25 fields. So that means we would need indexes on all these fields.

The query formed on these fields can have equality and inequality operators. Query can also have AND/OR/NOT operators.

There will be no full-text search requirement.

For such a scenario, ElasticSearch or MongoDB would be preferred?

I read in one article, that supporting more 5 than indexes in mongo could make writes slow. SO in such cases, ES should be preferred.

Our data would be around 200 GB. RPS would be 20,000.`

Editing the Qs - Just wondering ScyllaDB would make sense for this usecase?

rajya vardhan
  • 1,121
  • 4
  • 16
  • 29
  • 1
    I don't think this statement is true `more 5 than indexes in mongo could make writes slow`. That being said, I think you can search specific fields in both `mongoDB` and `Elasticsearch`. But `Elasticsearch` offers more features like `hot-warm` architecture, scaling with ease etc. Look at the features that each DB (I should not call Elastic as DB ) offers and see what budget works for you etc. – JBone Apr 13 '20 at 20:34
  • Use a code block, snippets are for css/html/javascript, see the edit help. – philipxy May 22 '20 at 21:50

1 Answers1

2

Although not exactly the answer to your question, this should give you some idea on how to choose a system, based on functional and non-functional requirements.

Coming to your main requirements, yeah you need not to index data on which you are not doing any searches, using the index option, you can disable indexing of a particular field(Note default is true). This would reduce your inverted index size and improve the performance.

The second part is to use the filter context, as you don't have the full-text search requirements and Elasticsearch caches the data in the filter context, which makes it very fast, without introducing an external cache system like Redis, etc.

More information on filter cache can be found in https://www.elastic.co/guide/en/elasticsearch/reference/current/query-filter-context.html#filter-context official doc, and quoting from the same doc:

Frequently used filters will be cached automatically by Elasticsearch, to speed up performance.

Also,

In a filter context, a query clause answers the question “Does this document match this query clause?” The answer is a simple Yes or No — no scores are calculated. Filter context is mostly used for filtering structured data, e.g.

Does this timestamp fall into the range 2015 to 2016? Is the status field set to "published"

Which seems to be your exact requirements.

Amit
  • 30,756
  • 6
  • 57
  • 88
  • 1
    Thank you for a great answer!! What' your view on too many indexes in mongo? – rajya vardhan Apr 14 '20 at 18:43
  • 1
    @rajyavardhan, I've not worked with mongo but had worked with dynamoDB a lot which is again a document-based NoSQL and as NoSQL best for key-based retrieval please refer this official doc https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-indexes-general.html which clearly says `Keep the number of indexes to a minimum. Don't create secondary indexes on attributes that you don't query often. Indexes that are seldom used contribute to increased storage and I/O costs without improving application performance.` and I believe it would be same in case of mongo. – Amit Apr 15 '20 at 00:23