0

Hi all please help me achieve this scenario where I have multiple files like aaa.txt, bbb.txt, ccc.txt with data as

aaa.txt:

100110,StringA,22
200110,StringB,2
300110,StringC, 12
400110,StringD,34
500110,StringE,423

bbb.txt as:

100110,StringA,20.1
200110,StringB,2.1  
300110,StringC, 12.2
400110,StringD,3.2
500110,StringE,42.1

and ccc.txt as:

100110,StringA,2.1
200110,StringB,2.1  
300110,StringC, 11
400110,StringD,3.2
500110,StringE,4.1

Now I have to read all the three files (huge files) and report the result as 100110: (22, 20.1,2.1). Issue is with the size of files and how to achieve this in optimized way.

Logicalj
  • 99
  • 3
  • 14
  • You're going to have to read all 3 files regardless. Is your issue with how to relate all 3 files together? – Compass Oct 10 '14 at 18:40
  • No how to read all the three huge files in an optimized way and to report them accordingly using multi-threading. – Logicalj Oct 10 '14 at 18:53
  • What's the definition for "huge file"? The really important thing here is to understand if, by requirement, the files are not expected to fit in memory. Can you please elaborate on that? Also, do the files all have the same order for the first field of each line, or can they be in different order? This changes the way the data can be read and has a critical impact on the performance you can expect. – Lolo Oct 11 '14 at 19:19

3 Answers3

1

I assume you have some sort of code to handle reading the files line by line, so I'll pseudocode a scanner that can keep pulling lines.

The easiest way to handle this would be to use a Map. In this case, I'll just use a HashMap.

    HashMap<String, String[]> map = new HashMap<>();

    while (aaa.hasNextLine()) {
        String[] lineContents = aaa.nextLine().split(",");
        String[] array = new String[3];
        array[0] = lineContents[2].trim();
        map.put(lineContents[0], array);
    }

    while (bbb.hasNextLine()) {
        String[] lineContents = bbb.nextLine().split(",");
        String[] array = map.get(lineContents[0]);
        if (array != null) {
            array[1] = lineContents[2].trim();
            map.put(lineContents[0], lineContents[2].trim());
        } else {
            array = new String[3];
            array[1] = lineContents[2].trim();
            map.put(lineContents[0], array);
        }
    }

    // same for c, with a new index of 2

To add synchronicity, you would probably use one of these maps.

Then you'd create 3 threads that just read and put.

Community
  • 1
  • 1
Compass
  • 5,867
  • 4
  • 30
  • 42
0

Unless you are doing a lot of processing on loading these files, or are reading a lot of smaller files, it might work better as a sequential operation.

PeterK
  • 1,697
  • 10
  • 20
0

If your files are all ordered, simply maintain an array of Scanner pointing to your files and read the lines one by one, output the result file in a file as you go.

Doing so, you will only keep in memory as many lines as the number of files. It is both time and memory efficient.

If your files are not ordered, you can use the sort command to sort them.

Jean Logeart
  • 52,687
  • 11
  • 83
  • 118