We have around 100 GB of data from different entities.
There are some dimension, some fact and some transaction entities.
Transaction items are in the range of 50 millions of rows.
All this data is stored in blob storage with one csv file for each entity.
There will be a copy in In-Role Cache for quick access.
In addition transactions happening are required to be processed using LINQ joins with some other entities and update, insert or delete items on entities stored on cache as well as the data on Blob Storage.
Is it a good idea to store each entity as a list (which is one object per entity) on the cache?
or store the each item(row) in the list(entity) separately on the cache?
what is the usual practice in this kind of cache scenarios?
Asked
Active
Viewed 61 times
0

Srinivas
- 2,479
- 8
- 47
- 69
-
1With 100GB of data and apparently non-trivial queries it sounds like you should use a storage system designed for that: a database. Possibly SQL Server. CSV and in-memory queries sound like a really bad way to go. – usr Jul 23 '14 at 11:19
-
I understand that.. but this started as a possible architecture to provide data access with very low latency using the capability of High Computating Azure A9 VMs with 16 cores & 112 GB RAM.. – Srinivas Jul 23 '14 at 11:29
-
1OK, you can't use a cache because a cache can be cleared at any moment in time. What latency requirements do you have? SQL Server answers simple queries in 300 microseconds plus network latency (~1ms?). – usr Jul 23 '14 at 11:47
-
Focus is on leveraging cloud technologies and Azure in specific. SQL Azure has been tried for POC and it has performance and connection issues. SQL Server on Azure VM is another alternative but not good at scale-out. Cache can be cleared but that is not the only copy. In-Role Cache is a distributed cache, that way it gives good scale-out mechanism. – Srinivas Jul 23 '14 at 12:14
-
1Are you only querying by key and getting a single row? That's the perfect case for Azure Tables. Tables is so simple that I have a hard time believing that it would not perform to satisfaction. – usr Jul 23 '14 at 12:23
-
Data Access is provided through OData Service. Bulk data in the range of few millions of rows will be downloaded through OData queries. Data will used for further analysis. – Srinivas Jul 23 '14 at 12:28
-
So what about Azure Tables? – usr Jul 23 '14 at 12:29
-
Azure Tables as in 'Azure Table Storage' right? and not SQL Azure Tables.. just to confirm.. – Srinivas Jul 23 '14 at 12:30
-
SQL Azure has frequent connection timeout issues due to throttling – Srinivas Jul 23 '14 at 12:32
-
1There are mitigation strategies for throttling. You are not the only customer on the planet facing throttling. It must be possible to cope with that. If not the product would be unusable for basically everyone, which is not the case. It sounds like you need to evaluate the services that are available in more detail. If you decide based on false assumptions you decide for the wrong solution. – usr Jul 23 '14 at 12:39