4

Am building an Index using RediSearch in a multi-tenant application that has got:

  • 150,000 tenants
  • Each tenant has on average 3,500 customers
  • Each customer has 10 fields that will be added to the index
  • All of the fields are TextFields.

Question is, what would be best practice (Performance, Memory/Storage, Flexibility) in such a case?

Should I create one customer_index with a tenant_code field to help identify which data belongs to which tenant or should I create a tenant specific index?

From my current experience and understanding, tenant-specific-index would mean many indexes but with less data in them and it would also give me the flexibility to drop and recreate an index for a specific tenant?

In Python, the code would be as below:

Single Customer Index

client = Client(`customer_index`)
client.create_index(
            [
                TextField('tenant_code'), TextField('last_name'), TextField('first_name'),
                TextField('other_name'), 
            ]
        )   

Tenant Specific Customer Index

client = Client(`tenant_code_customer_index`)
client.create_index(
            [
                TextField('last_name'), TextField('first_name'), TextField('other_name'), 
            ]
        )
lukik
  • 3,919
  • 6
  • 46
  • 89

1 Answers1

7

Because each tenant only has 3500 customers (relatively little), you'd be better off memory wise using a larger index. With so few records, the resource overhead for each index would likely exceed the size of the index itself. This will also increase the number of keys in redis itself, as a new Redis key is created for each indexed term per index. So if you have ~2000 unique terms in each DB, you will end up with 300M Redis keys (2k * 150k). In contrast, using a single index will leave you with only 2k keys.

Performance-wise, there shouldn't be any difference, either, because the tenant code is itself an inverted index, so it will be unlikely that search would need to sift through more records in a larger index.

For deletion you can simply gather a list of IDs which match a criteria, e.g. "FT.SEARCH idx @tenant:yourcode" and call FT.DEL on each of those records individually. I am assuming that this is not an operation that is being performed every five seconds, so you should be fin there.

Note that using 150k indexes right now is probably not even possible because a dedicated indexing thread is created for each index (though an option to have indexing performed on a single thread will be available in future releases).

Mark Nunberg
  • 3,551
  • 15
  • 18
  • Thanks. Very insightful. The regeneration of the index will happen once a month. So yes, I should be safe there. Question though. When you say relatively little, what could be defined as a lot? Redisearch is relatively new in the market _(compared to Elasticsearch etc)_ so there aren't many battle stories out there to help out gauge and I haven't seen much in the non-technical docs of Redislabs. – lukik Feb 01 '19 at 06:26
  • The key "currency" that RediSearch deals with are (1) terms and (2) documents. Whether there is a "little" or a "lot" depends largely on how many of each you have inside your index. For example, 2M documents can be a fairly small amount if each document contains only several bytes, but if each document is several MB long, then it might mean a heavier database. – Mark Nunberg Feb 01 '19 at 11:29