Let's assume a scenario where I have a lot of log files for a given system, let's imagine that it's petabytes of data. This is my scenario.
Used Technology
- For my purpose, I'm going to choose the C/C++ to do this.
My Problem
- I have the need to read these files, which are on disk, and do some processing later, whether sending them to a topic on some pub/sub system or simply displaying these logs on screen.
Questions
- What is the best buffer size for me to have the best performance in reading this data and which saves hardware resources such as disk and RAM memory?
- I just don't know if I should choose 64 Kilobytes, 128 Kilobytes, 5 Megabytes, 10 Megabytes, how do I calculate this?
- And if this calculation depends on how much available resource I have, then how to calculate from these resources?