Uploading a Log (.txt file) vs inserting 1 record per event on database, Efficiency on recording Logs

Question

I'm trying to record log files on my database. My question is which has the less load on making logs on the database. I'm thinking of storing long term log files ,maybe 3-5 years maximum, for an Inventory Program.

Process: I'll be using a barcode scanner. After scanning a barcode, I'll get all the details of who is logged in, date and time, product details then saved per piece.

I came up with two ideas.

After the scanning event, It will be saved on a DataTable then after finishing a batch.. DataTable will be written on a *.txt file and then uploaded to my database.
After every scanned barcode, an INSERT query will be executed. I suspect this option will be heavy on the server side since I'm not the only one using this server

What are the pros and cons of the two options?
Are there more efficient ways of storing logs?

Use the second. It is really the right way to use a database -- scan the data and put it in safe, ACID-compliant storage. — Gordon Linoff, Apr 01 '16 at 02:42
What amount of data loss is acceptable? What level of latency is acceptable? (latency = time between the scanning event and the log appearing in the database). I suggest you look into queues. Your scanner throws it's individual log events into a queue - it doesn't know anything about a database. The SQL Server reads things out of the queue - it doesn't know anything about scanners. If there is some kind of throughput mismatch on either side the queue handles it. Your option 1 is a queue set up on the scanner side but I"m talking about a standalone queue in the middle. — Nick.Mc, Apr 01 '16 at 04:19
@Nick.McDermaid from what I gathered, there should be no data loss. Data Loss will cause variance which should not be there. About the Latency, it should be a relatively short amount of time (0.5ms ~ 2sec). Thank you for the insight. — Hexxed, Apr 01 '16 at 05:24
It's often a question of: Do you want 0.1% possible data loss or 0.001% possible data loss for an extra million bucks? By latency I mean: between the scan activity and the end user seeing it in their report. Do you really need the scan rate seen on the end users report to be no more than 2 seconds out of date? Your datatable/txt file approach alone would mean more than 2 seconds latency. Another question is do you want to see historical trends and how long for? These three elements (data loss, latency, history) all have a cost and the business should understand this. — Nick.Mc, Apr 01 '16 at 06:07

score 2 · Accepted Answer · answered Apr 01 '16 at 03:26

Based on your use case, I also think you need to consider at least 2 additional factors, the first being how important is it that the scanned item is logged in the database immediately. If you need the scanned item to be logged because you'll be checking to see if its been scanned, for example to prevent other scans, then doing a single insert is probably a very good idea. The second thing to consider is will you ever need to "unscan" an item, and at which part of the process? If the person scanning needs the ability to revert the scan immediately, it might be a good idea to wait until theyre done all their scannings before dumping the data to the database, as this will let you avoid ever having to delete from the table.

Overall I wouldnt worry too much about what the database can handle, sql-server is very good at handling simultaneous single inserts into a table thats designed for that use case. If youre only going to be inserting new data to the end of the table, and not updating or deleting existing records, performance is going to scale very well. The same goes for larger batch inserts, theyre very efficient no matter how many rows you want to bring in, assuming your table is designed for that purpose.

So overall I would probably pick the more efficient solution from the application side for your specific use case, and then once you have decided that, you can shape the database around the code, rather than trying to shape your code around suspected limitations of the database.

Thank you. However, its not the limitations of the database that I'm worried about but the amount of traffic the server can handle.. But that is probably out of my question now. — Hexxed, Apr 01 '16 at 05:20
@Hexxed It might be a factor, but as an example, I work with a logging database that gets around 54,000 inserts per minute into 2 tables (27k for meta data, 27k for extended data) from ~180 different sources, and it runs fine. The same database also stores config data and gets about 3000 concurrent readers. We have peaked a couple times because of bad code, the breaking point was somewhere around 1.9m inserts per minute at which point we run out of connections. It runs on fairly decent hardware (512gb ram, dual 3.7ghz xeons, 10gb network), but that gives you an idea of single insert scale. — Unicorno Marley, Apr 01 '16 at 13:23

score 1 · Answer 2 · answered Apr 01 '16 at 02:54

What are the pros and cons of the two options?

Basically your question is which way is more efficient (bulk insert or multiple single insert)?

The answers is always depends and always be situation based. So unfortunately, I don't think there's a right answer for you

The way you structure the log table.
If you choose bulk insert, how many rows do you want to insert at 1 time?
Is it read-only table? And if you want to read from it, how often do you do the read?
Do you need to scale it up?
etc...

Are there more efficient ways of storing logs?

There're some possible ways to improve I can think of (not all of them can work together)

If you go with the first option, maybe you can schedule the insert to non-peak hours
If you go with the first option, chunk the log files and do the insert
Use another database to do the logging
If you go with the second option, do some load testing

Personally, I prefer to go with second option if the project is small to medium size and the logging is critical part of the project.

hope it helps.

Thank you for the enlightenment. The project is large for corporation level. Yes, Logs are critical for checking, progress, efficiency reports. — Hexxed, Apr 01 '16 at 03:02

score 1 · Answer 3 · edited May 23 '17 at 11:50

1

Go with the second option, and use transactions. This way the data will not be sent to the db until you call the transaction to complete. (Which can be scheduled.) This will also prevent broken data from getting into your database when a crash or something occurs.

Transactions in .net

Transaction Tutorial in C#

edited May 23 '17 at 11:50

Community

1
1

answered Apr 01 '16 at 02:55

Nobody

341
2
6

Uploading a Log (.txt file) vs inserting 1 record per event on database, Efficiency on recording Logs

3 Answers3