This is by large combination of design and code problem.
Use Case
- Given many log files in range (2MB - 2GB), I need to parse each of these logs and apply some processing, generate Java POJO
.
- For this problem, lets assume that we have just 1
log file
- Also, the idea is to making best use of System. Multiple cores are available.
Alternative 1
- Open file (synchronous), read each line, generate POJO
s
FileActor -> read each line -> List<POJO>
Pros: simple to understand
Cons: Serial Process, not taking advantage of multiple cores in the system
Alternative 2
- Open File (synchronous), read N
lines (N
is configurable), pass on to different actors to process
/ LogLineProcessActor 1
FileActor -> LogLineProcessRouter (with 10 Actors) -- LogLineProcessActor 2
\ LogLineProcessActor 10
Pros Some parallelization, by using different actors to process part of lines. Actors will make use of available cores in the system (? how, may be?)
Cons Still Serial, because file read in serial fashion
Questions
- is any of the above choice a good choice?
- Are there better alternatives?
Please provide valuable thoughts here
Thanks a lot