0

I have a big loop that updates a 2,800,000 records Access Database, i divided the loop to 7 threads so each threads work on 400,000 records, the loop takes about 0.7 seconds to update a single record because there are alot of calculations to be done.

i am sure the threads will help to make the process much faster because i tested the application on 7200RPM HDD and SSD and a ramdisk and the speed difference is not really noticeable so IO is not the bottleneck.

i want the first thread to process the first 400k records and the second thread to process the next 400k records and so on.

-what would be the right way to do this?

-should each thread has its own datatable and binding source?

-how would you combine the results in one table and show it in a datagridview when the process is done?

user1590636
  • 1,174
  • 6
  • 26
  • 56
  • Is this a one-time operation? If so, why bother writing a threaded method? If not, could it be a hint for bad design if that you have to update 2.8 million database records on a regular basis? If not, maybe Access is the wrong technology to use in the first place? In any case, this question indicates a different problem than threaded access to a DB. – Tomalak Aug 11 '12 at 12:36
  • the operation is to be done once every 3 months – user1590636 Aug 11 '12 at 12:37
  • There were a couple of other questions in my previous comment, care to answer them? Also: Is a set operation (`UPDATE Table SET Field = CalculatedValue()`) not possible? Note that you can use user-defined functions from VBA in Access SQL statements. Further: how long does it take *now*? Note that you will not increase the speed seven-fold by using seven threads. – Tomalak Aug 11 '12 at 12:42
  • theortically it will take 22 days with a single thread, hoping to reduce it to about 4 days with multiple threads – user1590636 Aug 11 '12 at 12:43
  • The Loop updates the column numbers 4 to 13 with new numerical data after that it does some calculations on the new data and fills the results in the 36 columns left then it colors the row in the datagrid depending on results, also the data is Simulation Data with alot of decimals – user1590636 Aug 11 '12 at 12:47
  • so i am not sure if i could use (UPDATE Table SET Field = CalculatedValue()) – user1590636 Aug 11 '12 at 12:49
  • I have the strong feeling that you have an architectural problem. You are using the wrong tools in the wrong way. Writing a threaded update mechanism will not help you. Also, even with a threaded update the execution time will not drop from 22 to 4 units of time. That would mean "threading" would - magically - lead to a linear performance increase. It won't happen. Under no circumstances. – Tomalak Aug 11 '12 at 12:53
  • i decide after i try it! – user1590636 Aug 11 '12 at 13:05
  • 1
    Good luck with that. But before, try to get the single SQL statement approach to work and see how well that goes. **AN AWFUL LOT** of performance is lost by issuing 2.8m separate UPDATE statements, you would not believe how much. – Tomalak Aug 11 '12 at 13:15
  • it seems that waiting 22 days is the only solution... – user1590636 Aug 11 '12 at 13:21
  • 1
    It really don't know how else I can possibly put it. So I say it again: **1)** Have you actually *tried* to make that a set-operation? That means "a single UPDATE statement that affects 2.8 million records" **2)** Have you tried a different technology than Access? **3)** Have you evaluated whether updating so many records on a regular basis is avoidable? **4)** 2.8m records are a lot. But 22 days? It's hard to imagine. Maybe your calculation is inefficient? Maybe you don't have the correct indexes set? There are **so many** possibilities other than waiting 22 days. – Tomalak Aug 11 '12 at 13:35
  • Does updating the color of the row in the datagrid contribute to the time it takes to update a record? Perhaps you should do all the updating separately from the UI and then redisplay the results in the grid. Take the UI out the equation. – Chris Dunaway Aug 13 '12 at 13:46

2 Answers2

1

An access database is simply a file. At the point of reading and writing to it, you are going to bottleneck, on top of that, there's a significant chance of it being corrupted doing something like this. Imagine doing the same thing with an xml file of the data.

All depends on what you are doing to the data. If there was no change to any columns in any keys or indexes, and not all records will read / changed. Then one thread to read and write and a pool to process might get you somewhere. The processing would have to be significant enough to make it worthwhile spinning up more than one thread though. As it is they are going to waiting on disk io unless there's a significant amount of it. If you have indexes that will changes and you don't have to use them in the operation. Drop them, process then put them back again.

If you are making significant changes to the data then maybe

One read thread from the existing db Then create seven (might want to adjust this based on a sensible number of processors) empty database with just this table in it Read from the parent, throw into a processor pool (if it it's worth having one), then write to one of the "seven" copies Then clear out the original and write the data back from the others (serially and put it back together)

All that said drop access, use a full dbms, because you would have probably seen some of the improvement you'd have expected.

Something to bear in mind when parallel processing. Where's the bottleneck? In your case probably disk IO, multiple threads did not address that, you just ended up with seven threads twiddling their thumbs waiting for the disk drive.

Tony Hopkinson
  • 20,172
  • 3
  • 31
  • 39
  • 1
    An SQL Server database is simply a file. This statement means nothing. Comparing an Access database to an XML file is also a pretty bold statement. – Tomalak Aug 11 '12 at 12:50
  • Access to a full dbms volume file is controlled by a server, access (no pun intended) to an access mdb is done through standard OS read / write no different to an xml file, or even a text file. As far as what the OP is trying to achieve that statement meant everything. It's nothing to do with the structure and complexity, and everything to do with the equivalent of seven users reading and writing to the same file. – Tony Hopkinson Aug 11 '12 at 12:56
  • IO is not a problem since i am using a RamDisk – user1590636 Aug 11 '12 at 12:57
  • Guys at least its worth trying, so i hope the answers be about best ways to do it rather than saying it won't work! – user1590636 Aug 11 '12 at 13:03
  • 1
    @user1590636 Sometimes, "it won't work" is the best advice you can get. – Tomalak Aug 11 '12 at 13:06
  • 2
    @Tony This is not true. Access databases are driven by and accessed through the MS Jet database engine. It supports page-level and record-level locking and concurrent readers/writers. It's an actual database engine, believe it or not. A comparison to an XML file or flat text is deeply flawed. – Tomalak Aug 11 '12 at 13:09
  • Instead of seven threads try two. Deepnds on what they are doing, but there's a tipping point in terms of adding threads and your processor effetively starts thrashing (spends more time switching threads than in them). – Tony Hopkinson Aug 11 '12 at 13:12
  • will try it woth 7 and post back ;) – user1590636 Aug 11 '12 at 13:21
  • Jet engine runs on the client. Look up Client/Server Access , every one that's real links to a full dbms backend, there's a reason for that. Didn't say it was database engine, I said it wasn't client server, and it isn't. – Tony Hopkinson Aug 11 '12 at 13:23
  • 2
    @TonyHopkinson Thanks for telling me what to look up. You know, I have never heard of these things before. Your comparison on an Access database to a flat file is still dead wrong, and that's what I was pointing out. – Tomalak Aug 11 '12 at 13:30
  • xml isn't a flat file. With a decent structure you can write to one part of it with out corrupting the rest. Multiple simultaneous accesses to the same file complicates the job so much, a full dbms has effecitively a single process that does all of that, so one task doesn't trample all over another, jet can't possibly do that. – Tony Hopkinson Aug 11 '12 at 14:43
  • @user1590636 IO may not be a problem, but IO synchronisation could be. If the DB your using has to force all access to be serial (I don't know if it does, it's pretty much what Tony and Tomalak are arguing about), then that will lose some or all of the multi-thread advantage, because you're forced into doing everything one go at a time. – Jon Hanna Aug 11 '12 at 16:04
1

I'd suggest trying to attack this from another angle. Looping through this many records and individually changing them is always going to be slow.

I suggest you try issuing update statements that update many records at a time and try to encapsulate your business logic by breaking it up into logical statements. For example:

UPDATE stock SET retail = 0.95 WHERE retail  < 1.5
UPDATE stock SET retail = 1.95 WHERE retail >= 1.5 and retail < 2.5
UPDATE stock SET retail = 2.95 WHERE retail >= 2.5 and retail < 3.5
UPDATE stock SET retail = 3.95 WHERE retail >= 3.5 and retail < 4.5
UPDATE stock SET retail = 4.95 WHERE retail >= 4.5 and retail < 5.5

If you can break up your business logic into separate statements, you'll find updating many records at a time substantially faster.

Failing that, post a bit more detail if you can!

Derek Tomes
  • 3,989
  • 3
  • 27
  • 41