What I ask here should be pretty common but my intent is to figure out the best possible way to do it.
- I have a list of files(say n) within a directory - all of which have been categorized by extensions.
- I have a csv file containing Regex patterns(say m) which I want to look for in all those files of a particular type.
- I want to have a final output wherein I have a Regex pattern, file name, line and line number listed.
Here are the few questions I have about how I should approach this:
- Is there a way where I could avoid m*n operations?
- What's faster - reading the files, buffering content and storing each line in say in an array before a search for all regex expressions or should I be taking a regex pattern, read the file line by line and search as I parse without using up memory?
- I figure that read/write operations are the most taxing - hence, I want to have 'n+1' reads(files, csv) and just a single write at the very end. Is my assumption and approach here correct?
- Arrays, Lists, hashmaps, something else - any suggestion on what would be the best way to have the task done? I think parsing files would be the key to efficiency?
- Any particular 'uncommon' Java APIs that I can make use of which reduce the code significantly?
I appreciate any insight/help with respect to this question.
.