1

I am developering a application which checks the database on startup (updates for new data) and when work is done(on shutdown/log-off) it pushes the performance logs to the database. The users themself is not changeing any data, they are only generating logs (the money comes from their use of the data ;)

When the users is done with the work, the application pushes logs to the database(MySQL database), I do not want to constantly push data, due to the connections are expected to drop and go offline doing the work day (mobile work), the less time online the better. This means the application have to be able to work in offline mode too.

The log pushed for a single user is usually about 2000 records, and each record contains about 70 bytes of data. There is about 100 user at peak time (may grow to 300 in the near future) which makes it about 200.000 records of logs which is pushed to the MySQL Database each day. Because the users work at the same hours, there is going to be heavy peak times. Worst case is 200.000 records each of 70 bytes at the same time(~14 mb of data).

The database I am using is a MySQL database, this is choosen mostly because:

  • It is free (Sells arguments)
  • I can find help online
  • It is a standard database, means other IT Dept most likely know about it already

I am developing the application using: C# .Net 4.5

I have tried to use EntityFramework, this is very easy to start with, but it kinda fails on preformance. The 2000 logs (inserts) for a single user takes about 7 seconds when I run the program + Server on my developer machine. And 7 seconds is unacceptable (moreover it will proberbly increase dramaticly when 200 users are doing it at the same time) As I have read about it, it appears EntityFramework makes every insert as a single SQL command, and take one SQL Command at a time.

So I have tried to use MySQL Connector/Net. But I do not want to do it like EntityFramework, and do each insert as a single command. So my eyes went to MySqlBulkLoader. But it only want to accept a file, and not raw data, is there a way to load MySqlBulkLoader with data within the program, I would prefer not to save data to the harddisk to be able to send data to the database, it feels like a unnecessary detour.

So my questions is(no more story telling ;)

  • Can I load MySqlBulkLoader with data from memory without creating a file on the disk?
  • Should I use MySQL Connector/Net or is there another way I should do it (like raw SQL statements)?

EDIT: THE ANSWER IS

Use MySQL Connector/Net with raw SQL commands, make the insert as a Batch Insert LIKE THIS.

Community
  • 1
  • 1
Mr. JWolf
  • 1,375
  • 2
  • 14
  • 33
  • Just curious, for your current set up and inserting the 2000 records - are you doing a single row at a time (insert record, commit 2000x) or are you setting up your entity to have all 2000 records prior to doing a commit? I would think you were doing the latter to be getting 7 seconds insert time, where the former would be much faster. – Kritner Jul 11 '14 at 12:25
  • Are you 100% certain that the Entity Framework is to blame and not the 2000 log records you are trying to insert? First determine where the bottle neck lies. Are there indexes on the tables causing inserts to be slow, etc? – Greg Burghardt Jul 11 '14 at 12:27
  • Did you try to concatenate your INSERT command to one single string? I mean "INSERT INTO xxxxx; INSERT INTO zzzzz;...." and then submit all the commands together with a single ExecuteNonQuery? – Steve Jul 11 '14 at 12:28
  • Kritner- I am inserting all 2000 records and then doing only 1 commit. I would expect 2000 commits to be slower, but I will try it. – Mr. JWolf Jul 11 '14 at 12:29
  • Steve - I want to insert 2000 records as fast and as simple as possible. – Mr. JWolf Jul 11 '14 at 12:29
  • My comment is linked to your use of the raw MySQL connector, in that case you need to build manually your query and thus, instead of calling 2000 times the ExecuteNonQuery you could loop to build a single query with 2000 insert. There is also the possibility to [use Dapper](https://github.com/StackExchange/dapper-dot-net) as your mini ORM (see the example about Executing a command multiple times) – Steve Jul 11 '14 at 12:40
  • @Mr.JavaWolf If I was misunderstood I apologize, I meant a single transaction/commit is preferred over one row at a time - if that's what you're already doing then that's good :) – Kritner Jul 11 '14 at 12:49
  • @Mr.JavaWolf lolol I think I mixed up former and latter :P – Kritner Jul 11 '14 at 12:51
  • @Kritner - Ahh ye, seems you are right – Mr. JWolf Jul 11 '14 at 12:59

1 Answers1

1

Suppose the records were 2000000 instead of just 2000. EF, like other ORMs, is designed for ease of coding in normal transaction workloads instead of performance critical intensive workloads.

The answer is simple: if you are not satisfied after you refactor your code to insert all items in a single DB transaction over a single connection (because 7 seconds is really too much for me too unless you close/open connection every time), you should do raw SQL statements in that part of the code and continue using EF in others.

There is no other way. Batch processing done right is by plain old SQL.

And MysqlBulkLoader is made only for file system operation, though the file can be a temporary file

usr-local-ΕΨΗΕΛΩΝ
  • 26,101
  • 30
  • 154
  • 305
  • I have tried your "plain old SQL" as you mentioned, creating a string like [THIS](http://stackoverflow.com/questions/5526917/how-to-do-a-batch-insert-in-mysql) where I insert multible rows in a single statement. My 7 seconds went down to about 0.1 seconds!!! This does seem to solve my problem. Thanks. – Mr. JWolf Jul 11 '14 at 13:34
  • Actually it is faster, those 0.1 seconds is including: 1: Creation of data(random garbage) 2: set up connection (localhost) 3: push data to server(localhost) 4: shut down connection (localhost) – Mr. JWolf Jul 11 '14 at 13:53
  • @Mr.JavaWolf you should do some tuning. this article (https://docs.jboss.org/hibernate/orm/3.3/reference/en/html/batch.html) is about Hibernate ORM for Java but **the very same principles** apply to .NET. So please insert 20 or 50 rows for statement for optimal performance. Find yourself the magic number (my company spent lots of man-hours and resulted in not being any single magic number for any generic project) – usr-local-ΕΨΗΕΛΩΝ Jul 11 '14 at 14:08
  • How come the number is so low? I can see the point not adding 1 million at one time, but 20-50? I would think around 500-1000, or am I very very wrong(naive)? – Mr. JWolf Jul 11 '14 at 14:34
  • The number comes from a trade-off between maximizing network usage and minimizing server lags. Very difficult to explain but I agree that in my projects we use batches of 500 (magic number) – usr-local-ΕΨΗΕΛΩΝ Jul 11 '14 at 14:42