10 millions updates per day for stats - What DB / Cache system should I use?

Question

I'm currently facing a performance issue on my website. The situation can be summarized by the following:

It's a .Net Mvc website using SQL Server 2012 Express. It's hosted on Windows Server 2012.
There are around 10 millions UPDATE queries per day for the core system (5 million on 2 different tables) and another 10 million for custom statistics purpose (again, 2*5 million).
There are far fewer INSERT queries.
The MVC website has no performance issue, most of the requests are handled by 1 method returning plain Content (not html).

Since statistics are not as important as the core system, and I see SQL Server struggling a lot, I thought it may be good to move these statistics tables somewhere else.

The main question is: What is the best way to handle Stats where updates are predominant? The idea is also to keep only one server.

I tried to have a look of what can be achieved to improve the situation:

Having a separate SQL Server database for stats on another hard disk? Maybe SQL Server can breathe better, but I'm not sure.
Using a NoSQL database? I just have a minor experience with MongoDb (but not for millions of requests) and I'm tempted to try RavenDB.
Using a cache system? Redis looks great but I'm not sure that it's a good idea to run it on Windows. Is AppFabric a viable option?

Any relevant ideas for the current situation will be appreciated.
Thank you

Here are more info about a statistic table I have:

TABLE [dbo].[UserStat](
[Id] [int] IDENTITY(1,1) NOT NULL,
[UserId] [int] NOT NULL,
[Hits] [int] NOT NULL,
[Points] [decimal](38, 6) NOT NULL,
[Date] [datetime] NOT NULL,
[LastHit] [datetime] NOT NULL,
[Ip] [varchar](256) NULL,
)

I update the stat like this:

UPDATE [UserStat] SET Hits = Hits + 1, Points = Points + @Points, LastHit = @Last WHERE UserId = @Id AND [Ip] = @Ip AND [Date] = @Date

If the row for the current user and date does not exists, I create it:

INSERT INTO [UserStat] ([UserId],[Hits],[Points],[Date],[LastHit],[Ip]) VALUES (@UserId,@Hits,@Points,@Date,@LastHit,@Ip)

There are two indices:

1 for the primary key

1 for getting and aggregating the stats of an user

INDEX [Select_UpdateUserStatNavigation] ON [dbo].[UserStat](
[UserId] ASC,[Date] ASC) INCLUDE ([Id],[Hits],[Points], [LastHit],[Ip])

The server is a VPS. Log and data files are on the same disk. There is no foreign key involved on the table.

Here are all the SELECT queries I found that query this table:

SELECT Points, Hits, [Date] FROM [UserStat] WHERE UserId = @UId AND [Date] >= @date

SELECT Ip FROM [UserStat] WHERE UserId = @UId AND LastHit >= DATEADD(MINUTE,-15,getdate())

SELECT COUNT(Id) FROM [UserStat] WHERE [LastHit] >= DATEADD(MINUTE,-15,getdate())

But I'm not really worried about the SELECT, more about the number of UPDATEs ^^.

Is it bulk update or per row update type.If INSERT query is less then how come update query more.Are same records updated several times.?Is there any concurrency issue.Atleast you need Enterprise edition.Are there indexes on tables ? — KumarHarsh, Feb 02 '15 at 12:20
We really need the DDL for your tables to help with this. Also aer you using physical or virtual servers? Have you got your log and data files on different disks? Ae your tables involved in any foreign key relationships? What indexes do you have? — DB101, Feb 02 '15 at 14:28
Thank you for your comments, I added some information below the original question. About having Log and Data file on separate disks, I just discovered that info when I started to get performance problems :/. If I remember correctly, I need to stop the DB to move the Log file. If necessary I will have a look at it. — Tom741, Feb 02 '15 at 16:05

theMayer · Answer 1 · 2015-02-04T03:35:10.117

This is really a very simple and good example of when a NoSql database is appropriate. NoSql was created for "web-scale" applications, such as this, where the pace and volume of data simply outstrip the ability of SQL databases to keep up (a big weakness of relational DBMS's).

In fact, regular SQL is not a good option for your scenario. There are several reasons for this, including:

SQL is useful for handling relational data. The data here has no real relationships or dependencies (at least, not that you have described), and in reality, even modestly-complex realtional datasets would work better denormalized and put into NoSql platforms.
SQL introduces a tremendous amount of overhead. Simply running a query to get a single value from the database costs at a minimum 3-5x what it would in a NoSql datastore due to query interpretation, finding the index, querying the index, pulling the value, etc., where a NoSql datastore gets you the record in one step.
SQL databases are designed primarily to be highly-consistent. This means that they generally can't reside on more than one machine (though this isn't complety true anymore) and they have additional overhead to ensure data remains consistent.

Now, let's look at your particular use-case:

Lots of update transactions, with the occasional insert. Most NoSql database platforms use a Set operation, which updates or inserts as necessary. No need to run two statements to do this every time.
Single Primary Key. NoSql databases are key-value stores, where your key (in this case, UserId) points to a single record in the database.
Simple stats and indexing. Several NoSql databases offer built-in indexing capability, and some even allow you to do map-reduce on your data to get the detailed stats out. Others do data aggregation automatically and you can write special queries to get the data you need. In this case, your "Stat Id" field is useless and can be dropped (yay, less storage taken!).
Fast and Scalable. This is something you just aren't going to touch with a SQL database. NoSQL was designed for this.

Given the above, your scenario is a textbook example of when to apply a NoSql solution. I can recommend Couchbase, which is an extremely fast in-memory database with disk-based storage (this fulfills your cache and data storage requirement in one shot). You might also consider Elasticsearch for your statistics store as it can perform some very good data aggregations out of the box. Whichever NoSql solution you pick, you will get flexible scalability and ease of maintenance. I daresay your days of being a full-time DBA will be over.

DB101 · Accepted Answer · 2015-02-03T13:09:41.887

1

Can you confirm that ID is your primary key? If so, then that will be fine as it is a monotonically increasing value and good for Inserts. I think your other index (for the purposes of the updates) should be

INDEX [Select_UpdateUserStatNavigation] ON [dbo].[UserStat](
[UserId] ASC,[IP] ASC, [Date] ASC).

Make sure the columns are ordered from the most selective to the least selective in the index. This should speed up the updates in the sense that rows can be located quicker. We can look at indexes for the SELECT later.

BY VPS, do you mean it is a virtual server? I would take a look at your IO stats to check that IO is not a bottleneck. How much memory is allocated to SQL? That could be another issue. Insufficient memory could result in paging to disk - the slowest part of your IO subsystem.

I would take a look at splitting your log and data disks onto separate disks if possible. Having them on the same disk can cause disk contention - again on the slowest part of your IO subsystem.

Can you post the select queries that are used? I can then give suggested indexes if required.

In addition, you may want to replace your separate insert and update procs with a MERGE such as below.

MERGE UserStat AS TargetTable
USING (SELECT @UserId UserID,@Hits Hits,@Points Points,@Date [Date],@LastHit LastHit,@Ip Ip) AS SourceData
ON SourceData.UserID = TargetTable.UserID 
    AND SourceData.IP = TargetTable.IP 
    AND SourceData.[Date] = TargetTable.[Date])
WHEN MATCHED THEN UPDATE SET Hits = Hits + 1, Points = Points + SourceData.Points, LastHit = SourceData.LastHit 
WHEN NOT MATCHED THEN INSERT (UserID,Hits,Points,[Date],LastHit,Ip)
                 VALUES(SourceData.UserID,SourceData.Hits,SourceData.Points,SourceData.[Date],SourceData.LastHit,SourceData.Ip)

edited Feb 03 '15 at 13:09

answered Feb 03 '15 at 12:38

DB101

633
4
8

Hello DB101, Yes, Id is the primary key. Yes, it's a virtual server. The disks seems to be quite solicited by SQL Server as well. The server has 8gb of Memory and SQL Server consume usually between 2.5 and 3gb. There're around 1/2gb of free memory. I didn't know that the order of the columns in the Index had an impact. If I usually get the last rows based on a close Date ([Date] >= @date), should I put [Date] DESC in the index declaration ? – Tom741 Feb 03 '15 at 13:10
If the date is the most selective value, then that should be put at the start of the index. I see that you are using a Datetime column - so if you are storing the time as well ie dd mm yy hh mm ss as opposed to dd mm yy 00 00 00, then you can put the Date as the leading column of the index as it will be highly selective. – DB101 Feb 03 '15 at 13:27
I don't understand what you mean by 'The disks seems to be quite solicited by SQL Server as well'. – DB101 Feb 03 '15 at 13:28
I usually have look in the Resource Monitor. For the disk the most solicited file is the Log file. By memory (I can confirm the figures tonight), the disk queue length is around 0.20 and every 5 seconds, it jumps to 1 because of some writes operations (if I remember correctly it's on the mdf file). – Tom741 Feb 03 '15 at 13:58
OK - that indicates that disk contention is not a problem. Are you running IIS on the same server? Also what is the table and database size. @Vahid Farahmandian may be correct in that you may need to upgrade versions. However I'd rather eliminate other possibilities before giving you a solution that will cost money. You don't have a virus scanner running on SQL do you? – DB101 Feb 03 '15 at 14:09
Yes, IIS is on the same server. I don't remember the table size, I look at them sometimes but that is not the biggest table. For the whole DB, it's around 1.5Gb. And there is no antivirus on the server. – Tom741 Feb 03 '15 at 14:13
Express is limitied to 1GB RAM, so you may be hitting a limit there. I don't understand how you get a reading of 2.5/3GB consumed by SQL. SQL Express is also limited to the fewer of 1 socket or 4 cores http://stackoverflow.com/questions/1169634/limitations-of-sql-server-express – DB101 Feb 03 '15 at 14:30
Apparently having 3gb consumed on the express edition is ok: http://dba.stackexchange.com/questions/33088/why-does-sql-server-2012-express-use-9-5gb-of-ram-on-my-server. It's 1gb limited for the "Buffer pool". I'll try to see if the struggle come from that tonight. My server has 4 cores on 1 socket so no issue on this side. – Tom741 Feb 03 '15 at 15:23
Hello, here are some figures: there's a constant 500kb/s write on the log file. Disk queue length is around 0.20. Sometimes (every 10/15 seconds), the data file goes up to 15mb/s and queue length goes to 1. Free memory on the server: 1.25gb. And SQL server process is just below 3gb. CPU for SQL Server is always between 35 and 50 percent (accumulated value for the 4 cores). – Tom741 Feb 04 '15 at 14:42
OK - your disk io seems ok. Did you try the index suggestion and MERGE query above? – DB101 Feb 04 '15 at 16:25
Hello, Having the index suggested for the update created a keylookup on Hits and Points, so apparently it is less efficient than the old index. – Tom741 Feb 05 '15 at 08:30
Moving the log file on another disk was apparently a good idea. I don't the "high peak" of writes on the mdf anymore. both log and mdf file writes constently at around 500kb /sec, Disk queue length is lower. – Tom741 Feb 05 '15 at 16:51
Excellent - I'm glad it worked. Is overall performance now acceptable? – DB101 Feb 05 '15 at 16:59
For the moment yes, but for the future I don't know. Following the answer of theMayer, I had a look for NoSql option, I will maybe try MongoDb with the "$inc" stuff that is apparently perfect for multiple updates like I have. I'll try to run SQL Server and MongoDb to see which one performs best on this scenario. Thanks for your help. – Tom741 Feb 05 '15 at 21:30
You will find similar (rough order of magnitude) performance between a well-tuned SQL and Mongo for reasonable numbers of writes. Couchbase was about 1 order of magnitude better in this scenario - but where it pulls away is for massive numbers of writes - you will find Couchbase will significantly outperform Mongo, and both will blow SQL away. The really big difference is in the amount of effort you have to dedicate to maintaining your SQL performance (very high) vs. the others (low). – theMayer Feb 06 '15 at 14:22

score 0 · Answer 3 · answered Feb 05 '15 at 14:13

RavenDB is extremely easy to get up and running for this scenario. You will get quick writes and potentially quick reads. You also get ACID or as close as your going to get. RavenDB is easy to wire up in MVC. Since you have Mongo experience, the concept of a document should not be foreign to you. Use the RavenDB C# client libraries in your MVC app and within a few hours you should be able to make major progress. Just be sure you understand the limitations. By default queries may be a few microseconds behind updates and like many NoSql or CQRS solutions, if you wipe out the cache for the queries, it may take a few minutes to a few hours to fully rebuild the cache.

Vahid Farahmandian · Answer 4 · 2015-02-02T12:29:49.090

-1

Before getting into main problem, some changes must be happen:

You should migrate from Express edition to Enterprise edition or at least Standard Edition(Express vs others)

As you have tons of updates on your data, you should disable your indexes(if there is any)

Try to re-size your table's columns, so your record's cells may be stored in less pages, and this will help your update process to speed-up(for example if you have a table with 20 columns, and you always update just fixed 5 known column, then separate this 5 column from your other 15 column. this may help you to arrange your data in less pages, and when you have less pages, you can find your records in faster pace. of coarse this is not based on normal forms, but it can help your performance)

take a look at your available memory and CPU. these two are cornerstone for performance.

and about your main problem, I need to know some more about your Stats and it's table and also it's usage. do you mean sql server statistics, or you mean some thing else?

edited Feb 02 '15 at 12:29

answered Feb 02 '15 at 12:24

Vahid Farahmandian

6,081
7
42
62

I disagree. There is nowhere near enough information for you to provide those recommendations. – DB101 Feb 02 '15 at 14:29
Which recommendations you mean? I also provide some general tips, that every one should consider, but for his main question I have asked him for giving some more detailed information – Vahid Farahmandian Feb 02 '15 at 19:42
Your suggestion to migrate from Express to another version could be expensive in licencing terms when this hasn't even been confirmed as the cause of the issue. Also disabling indexes will not speed up inserts as it will cause table scans. – DB101 Feb 03 '15 at 12:40
@DB101 thank you for your comments. Insertion is effected by indexes. before each insert take place, indexes should be rearrange in order to put the record in it's appropriate place and in some cases page splitting may be required. these processes can be expensive when inserting a heavy load of records to table is performed. About Express, as you know Express has limited capability and it is not good for such a heavy scenario as it uses limited amount of memory and also it does not support some beneficial capabilities etc. These are my reason why I have suggest them in my comment – Vahid Farahmandian Feb 03 '15 at 13:05
My mistake - I meant disabling indexes will not speed up Updates especially in an update biased environment such as that mentioned above. I don't believe Express is deficient (yet!) as we don't know enough about the data sizes. There may be 10 million Updates, but he say's that Inserts are considerably fewer and so he may not be hitting any of the database size limits (yet). Also note that his data rows aren't particularly wide, so Selecting records (if indexed correctly) should not be problematic unless he is selecting millions of rows. – DB101 Feb 03 '15 at 13:22
@DB101 you have interesting and valuable comments. thank you for them. I think Express edition is bottleneck or it will be bottleneck in such a scenario. because we are going to face with memory lacking alert! about Update, if searching is based on index and non index fields are going to be updated,then I agree with you, but if a indexed field is going to be updated then I think indexes should be disabled. – Vahid Farahmandian Feb 03 '15 at 13:32
Is there a way that I can see if the memory is the bottleneck ? Where can I find this "memory lacking alert" ? – Tom741 Feb 03 '15 at 13:52
1

@Tom741 yes there is plenty of ways you may find the memory bottlenecks. check this link: http://www.mssqltips.com/sqlservertip/2304/how-to-identify-microsoft-sql-server-memory-bottlenecks/ and also check this too: https://msdn.microsoft.com/en-us/library/ms190994.aspx – Vahid Farahmandian Feb 03 '15 at 14:22

10 millions updates per day for stats - What DB / Cache system should I use?

4 Answers4