0

I have two files assume its already sorted. This is just example data, in real ill have around 30-40 Millions of records each file Size 7-10 GB file as row length is big, and fixed. It's a simple text file, once searched record is found. ill do some update and write to file.

File A may contain 0 or more records of matching ID from File B Motive is to complete this processing in least amount of time possible. I am able to do but its time taking process... Suggestions are welcome.

File A
1000000001,A
1000000002,B
1000000002,C
1000000002,D
1000000002,D
1000000003,E
1000000004,E
1000000004,E
1000000004,E
1000000004,E
1000000005,E
1000000006,A
1000000007,A
1000000008,B
1000000009,B
1000000010,C
1000000011,C
1000000012,C

File B
1000000002
1000000004
1000000006
1000000008
1000000010
1000000012
1000000014
1000000016
1000000018\

// Not working as of now. due to logic is wrong.
    private static void readAndWriteFile() {
        
        System.out.println("Read Write File Started.");
        long time = System.currentTimeMillis();
        try(
                BufferedReader in = new BufferedReader(new FileReader(Commons.ROOT_PATH+"input.txt"));
                BufferedReader search = new BufferedReader(new FileReader(Commons.ROOT_PATH+"search.txt"));
                FileWriter myWriter = new FileWriter(Commons.ROOT_PATH+"output.txt");
            ) {
            
            String inLine = in.readLine();
            String searchLine = search.readLine();
            boolean isLoopEnd = true;
            while(isLoopEnd) {
                
                if(searchLine == null || inLine == null) {
                    isLoopEnd = false;
                    break;
                }
                
                 if(searchLine.substring(0, 10).equalsIgnoreCase(inLine.substring(0,10))) {
                     System.out.println("Record Found - " + inLine.substring(0, 10) + " | " + searchLine.substring(0, 10)  );
                     myWriter.write(inLine + System.lineSeparator());
                     inLine = in.readLine();
                 }else {
                     inLine = in.readLine();
                 }
                 
             }
          
            in.close();
            myWriter.close();
            search.close();
        } catch (Exception e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }
        System.out.println("Read and Write to File done in - " + (System.currentTimeMillis() - time));
    }
Santosh
  • 55
  • 1
  • 9
  • Logic I want to implement is, read line from each file and keep moving until match found, just not able to handle which file pointer to move when. – Santosh Sep 01 '21 at 08:54
  • // Since file is already sorted, i was looking for the //ans i found here.. long seachInt = Long.parseLong(searchLineSubString); long inInt = Long.parseLong(inputLineSubString);// Which pointer to move.. if(seachInt < inInt) { searchLine = search.readLine(); }else { inLine = in.readLine(); } } – Santosh Sep 01 '21 at 09:47

2 Answers2

0

My suggestion would be to use a database. As said in this answer. Using txt files has a big disadvantage over DBs. Mostly because of the lack of indexes and the other points mentioned in the answer.

So what I would do, is create a Database (there are lots of good ones out there such as MySQL, PostgreSQL, etc). Create the tables that are needed, and read the file afterward. Insert each line of the file into the DB and use the db to search and update them.

Maybe this would not be an answer to your concrete question on

Motive is to complete this processing in the least amount of time possible.

But this would be a worthy suggestion. Good luck.

Renis1235
  • 4,116
  • 3
  • 15
  • 27
  • Yes you are right, but here I want to avoid time consumed in insertion of DB and select queries. I will update the logic I want to try with above code... – Santosh Sep 01 '21 at 08:49
  • Do not reinvent the wheel. Inserting them once, will take a lot less time than searching the whole file every time you need somehting. – Renis1235 Sep 01 '21 at 08:54
0

With this approach I am able to process 50M Records in 150 Second on i-3, 4GB Ram and SSD Hardrive.

private static void readAndWriteFile() {
            
            System.out.println("Read Write File Started.");
            long time = System.currentTimeMillis();
            try(
                    BufferedReader in = new BufferedReader(new FileReader(Commons.ROOT_PATH+"input.txt"));
                    BufferedReader search = new BufferedReader(new FileReader(Commons.ROOT_PATH+"search.txt"));
                    FileWriter myWriter = new FileWriter(Commons.ROOT_PATH+"output.txt");
                ) {
                
                String inLine = in.readLine();
                String searchLine = search.readLine();
                boolean isLoopEnd = true;
                while(isLoopEnd) {
                    
                    if(searchLine == null || inLine == null) {
                        isLoopEnd = false;
                        break;
                    }
                    // Since file is already sorted, i was looking for the //ans i found here..
long seachInt = Long.parseLong(searchLineSubString);
                long inInt = Long.parseLong(inputLineSubString);

                     if(searchLine.substring(0, 10).equalsIgnoreCase(inLine.substring(0,10))) {
                         System.out.println("Record Found - " + inLine.substring(0, 10) + " | " + searchLine.substring(0, 10)  );
                         myWriter.write(inLine + System.lineSeparator());
                        
                     }
                     
// Which pointer to move..
if(seachInt < inInt) {
                     searchLine = search.readLine();
                 }else {
                     inLine = in.readLine();
                 }

                 }
              
                in.close();
                myWriter.close();
                search.close();
            } catch (Exception e) {
                // TODO Auto-generated catch block
                e.printStackTrace();
            }
            System.out.println("Read and Write to File done in - " + (System.currentTimeMillis() - time));
        }
Santosh
  • 55
  • 1
  • 9