Assume that I've got four large files (too large to bring into memory even individually) that has information I need to process. I intend to produce a single application level object (Record) from each line in file #1. Files 2-4 each have additional pieces of information required to compose this Record object. For example, the file structure may be as follows:
File #1:
key, description
File #2:
key, metadata, size
File #3:
origin, rate, key
File #4:
key, startDate, endDate
Each file has a single column (of known position within a line) that represents a unique key. This key is shared across files, but there is no guarantee that each key that exists in any one file, exists in the others, meaning we'll only process the subset of keys that exist in all. The rows of the files are not sorted. Can you devise an algorithm to produce the app-level objects by processing these files?