Doing your task in CUDA will not help much over doing the same thing in CPU.
Assuming that your files are stored on a standard, magnetic HDD, the typical single-threaded CPU program would consume:
- About 5ms to find the sector where the file is stored and put it under the reading head.
- About 10ms to load 1MB file (assuming 100MB/s read speed) into RAM memory
- Less than 0.1ms to load 1MB data from RAM to CPU cache and process it using a linear search algorithm.
That is 15.1ms for a single file. If you have 1000 files, it will take 15.1s to do the work.
Now, if I give you super-powerful GPU with infinite memory bandwith, no latency, and infinite processor speed, you will be able to perform the task (3) with no time. However, HDD reads will still consume exactly the same time. GPU cannot parallelise the work of another, independent device.
As a result, instead of spending 15.1s, you will now do it in 15.0s.
The infinite GPU would give you a 0.6% speedup. A real GPU would be not even close to that!
In more general case: If you consider using CUDA, ask yourself: is the actual computation the bottleneck of the problem?
- If yes - continue searching for possible solutions in the CUDA world.
- If no - CUDA cannot help you.
If you deal with thousants of tiny files and you need to perform reads often, consider techniques that can "attack" your bottleneck. Some may include:
- RAM buffering
- Putting your hard drives in a RAID configuration
- Getting an SSD
there may be more options, I am not an expert in that area.