3

I have found this question in other languages, but have yet to find a solution to this issue in a java application.

I have a large .txt file with millions of records. Each record is /n delimited. Basically it is a single column of data from a table. The goal is to read the data from the input file and partition it. Then write the partitioned data to a new file. For example, a file with 2 million records will become 200 files with 10,000 records each (with the last file containing <10,000.)

I am successfully reading and partitioning the data. I am successfully creating the first file and it is being named properly.

The problem is only 1 file is created and it is empty. The code as is compiles and runs without errors or exceptions.

My code is below:

    import java.io.BufferedReader;
    import java.io.BufferedWriter;
    import java.io.FileReader;
    import java.io.FileWriter;
    import java.io.IOException;
    import java.io.StringWriter;
    import java.util.ArrayList;
    import java.util.Collection;
    import java.util.List;
    import java.util.concurrent.atomic.AtomicInteger;
    import java.util.stream.Collectors;

    public class ChunkTextFile {

    private static final String inputFilename = "inputFile.txt";

    public static void main(String[] args) {

        BufferedReader reader = null;

        BufferedWriter fileWriter = null;

        BufferedWriter lineWriter = null;

        StringWriter stringWriter = null;

        // Create an ArrayList object to hold the lines of input file

        List<String> lines = new ArrayList<String>();

        try {
            // Creating BufferedReader object to read the input file

            reader = new BufferedReader(new FileReader("src" + "//" + inputFilename));

            // Reading all the lines of input file one by one and adding them into ArrayList
            String currentLine = reader.readLine();

            while (currentLine != null) {
                lines.add(currentLine);

                currentLine = reader.readLine();

            }
            // End of file read.

           //Partition ArrayList into a collection of smaller Lists<String>
            final AtomicInteger counter = new AtomicInteger(0);
            final int size = 10000;

            Collection<List<String>> partitioned = lines.stream()
                    .collect(Collectors.groupingBy(it -> counter.getAndIncrement() / size)).values();

            //Printing partitions. Each partition will be written to a file.
            //Testing confirms the partitioning works correctly.
            partitioned.forEach(System.out::println);

            //Iterate through the Collections and create a file for List<String> object.
            //Testing confirms that multiple files are created and properly named.
            Integer count = 0;
            for (List<String> chunks : partitioned) {
                // Prepare new incremented file name.
                String outputFile = "batched_items_file_";
                String txt = ".txt";
                count++;


                String filename = outputFile + count + txt;

                // Write file to directory.
                fileWriter = new BufferedWriter(new FileWriter("src" + "//" + outputFile));
                fileWriter = new BufferedWriter(new FileWriter(filename));

                //Iterate through the List of Strings and write each String to the file.
                //Writing is not successful. Only 1 file is created and it is empty.
                for (String chunk : chunks) {
                    stringWriter = new StringWriter();
                    lineWriter = new BufferedWriter(stringWriter);
                    // Prepare list of strings to be written to new file.
                    // Write each item number to file.
                    lineWriter.write(chunk);
                    lineWriter.flush();
                }
                lineWriter.close(); // <- flush the BufferedWriter

                fileWriter.close();
            }

        } catch (IOException e) {
            e.printStackTrace();
        } finally {
            // Closing the resources
            System.out.println("Finished");

            try {
                if (reader != null) {
                    reader.close();
                }

                if (fileWriter != null) {
                    fileWriter.close();
                }

                if (stringWriter != null) {
                    stringWriter.close();
                }
            } catch (IOException e) {
                e.printStackTrace();
            }
        }
    }
}

Input file example:

230449
235659
295377
329921
348526
359836
361447
384723
396202
571490

Thank you in advance.

Holger
  • 285,553
  • 42
  • 434
  • 765
Jeremy
  • 326
  • 4
  • 15
  • What does your input file look like? – Nicholas K Feb 22 '19 at 16:09
  • It looks like you've confused `\n` with `/n`, `"\\"` with `"//"`, and you really ought to take a look at `java.nio.file.Files` and Guava's `Lists.partition` and the try-with-resources construct, but it also is not necessary to store all the data in memory at once to perform this operation. You don't get any output since you send all the data to a `StringWriter` instead of a `FileWriter`. – David Conrad Feb 22 '19 at 16:15
  • I didn't check absolutely carefully, but it didn't look like you were changing the name for each file, so you write the file, then you write on top of it, and then on top of it. I think. Check that. – Joseph Larson Feb 22 '19 at 16:21
  • @Jeremy the question is not supposed to transmute into a solution. That’s what the answers are for. – Holger Feb 22 '19 at 17:29
  • @Holger Thank you for rolling the question back. I will provide the working version in my edited answer below. – Jeremy Feb 22 '19 at 17:44

5 Answers5

6

You don't need all those extra writers in your for and the writer supposed to write (fileWriter) to the file is not being called. Replace your for by this one:

for (String chunk : chunks) {
    fileWriter.write(chunk);
}

Tip: Just call fileWriter.close() once inside the finally block. The close method will automatically flush the writer for you (there's no need to call fileWriter.flush()).

André Paris
  • 139
  • 6
  • Thank you. This worked perfectly. Now I just need to I need to `\n` delimit the data added to the new files. Instead of 10,000 separate numbers I have 200+ files with one long string of numbers in them. – Jeremy Feb 22 '19 at 16:30
  • 1
    Better: use [try-with-resources](https://docs.oracle.com/javase/8/docs/technotes/guides/language/try-with-resources.html) instead of closing manually. – Holger Feb 22 '19 at 17:22
1

You can use just

Path file = Paths.get(filename);
Files.write(file, chunks, Charset.forName("UTF-8"));

And, you should put count=0 before loop, otherwise it will be always 0.

Overall it will be like this:

import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.nio.charset.Charset;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.ArrayList;
import java.util.Collection;
import java.util.List;
import java.util.concurrent.atomic.AtomicInteger;
import java.util.stream.Collectors;

public class ChunkTextFile {

private static final String inputFilename = "inputFile.txt";

public static void main(String[] args) {

    BufferedReader reader = null;


    // Create an ArrayList object to hold the lines of input file

    List<String> lines = new ArrayList<String>();

    try {
        // Creating BufferedReader object to read the input file

        reader = new BufferedReader(new FileReader(inputFilename));

        // Reading all the lines of input file one by one and adding them into ArrayList
        String currentLine = reader.readLine();

        while (currentLine != null) {
            lines.add(currentLine);

            currentLine = reader.readLine();

        }
        // End of file read.

        //Partition ArrayList into a collection of smaller Lists<String>
        final AtomicInteger counter = new AtomicInteger(0);
        final int size = 10;

        Collection<List<String>> partitioned = lines.stream()
                .collect(Collectors.groupingBy(it -> counter.getAndIncrement() / size)).values();

        //Printing partitions. Each partition will be written to a file.
        //Testing confirms the partitioning works correctly.
        partitioned.forEach(System.out::println);

        //Iterate through the Collections and create a file for List<String> object.
        //Testing confirms the file is created and properly named.
        Integer count = 0;
        for (List<String> chunks : partitioned) {
            // Prepare new incremented file name.
            String outputFile = "batched_items_file_";
            String txt = ".txt";

            count++;

            String filename = outputFile + count + txt;

            Path file = Paths.get(filename);
            Files.write(file, chunks, Charset.forName("UTF-8"));
        }

    } catch (IOException e) {
        e.printStackTrace();
    } finally {
        // Closing the resources
        System.out.println("Finished");

        try {
            if (reader != null) {
                reader.close();
            }

        } catch (IOException e) {
            e.printStackTrace();
        }
    }
 }
 }
Anar Orujov
  • 591
  • 5
  • 18
1

There are several issues with your code. The files are empty, because you don’t close the writers. You are even creating redundant writers as in this sequence

fileWriter = new BufferedWriter(new FileWriter("src" + "//" + outputFile));
fileWriter = new BufferedWriter(new FileWriter(filename));

To handle resources like readers and writers in the optimal way, use the try-with-resources statement.

The missing new lines is only a small problem.

Further, you are unnecessarily reading the entire input file into the heap memory, just to be able to perform a questionable Stream operation on it. While it is possible to stream over a file directly, e.g. with Files.lines, the grouping with an AtomicInteger is not the intended way of using a Stream anyway. And the end result would still hold the entire input lines in memory while it would be straight-forward to write the lines to the target file immediately.

A simple and efficient solution would be

import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Paths;

public class ChunkTextFile {

    private static final String inputFilename = "inputFile.txt";

    public static void main(String[] args) {
        final int size = 10000;
        try(BufferedReader reader=Files.newBufferedReader(Paths.get("src", inputFilename))) {
            String line = reader.readLine();
            for(int count = 0; line != null; count++) {
                try(BufferedWriter writer = Files.newBufferedWriter(
                        Paths.get("batched_items_file_" + count + ".txt"))) {
                    for(int i = 0; i < size && line != null; i++) {
                        writer.write(line);
                        writer.newLine();
                        line = reader.readLine();
                    }
                }
            }
        }
        catch(IOException ex) {
            ex.printStackTrace();
        }
    }
}
Holger
  • 285,553
  • 42
  • 434
  • 765
0

A StringWriter is not for writing strings, it is for writing to a string.

David Conrad
  • 15,432
  • 2
  • 42
  • 54
0

I am accepting the above answer as it solved my problem, but I wanted to expand on it for anyone that finds this question and answer. For the created files to be in the same format as the input file (newline delimited) I changed my code using the accepted answer and added System.lineSeparator().

The final solution looks like this.

fileWriter.write(chunk + System.lineSeparator());

Thank you again for the quick responses.

This is the working version. I recommend commenting out or removing partitioned.forEach(System.out::println); to improve performance.

import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.FileReader;
import java.io.FileWriter;
import java.io.IOException;
import java.io.StringWriter;
import java.util.ArrayList;
import java.util.Collection;
import java.util.List;
import java.util.concurrent.atomic.AtomicInteger;
import java.util.stream.Collectors;

public class ChunkTextFile {

private static final String inputFilename = "inputFile.txt";

public static void main(String[] args) {

    BufferedReader reader = null;

    BufferedWriter fileWriter = null;


    // Create an ArrayList object to hold the lines of input file

    List<String> lines = new ArrayList<String>();

    try {
        // Creating BufferedReader object to read the input file

        reader = new BufferedReader(new FileReader("src" + "//" + inputFilename));

        // Reading all the lines of input file one by one and adding them into ArrayList
        String currentLine = reader.readLine();

        while (currentLine != null) {
            lines.add(currentLine);

            currentLine = reader.readLine();

        }
        // End of file read.

        final AtomicInteger counter = new AtomicInteger(0);
        final int size = 10000;

        Collection<List<String>> partitioned = lines.stream()
                .collect(Collectors.groupingBy(it -> counter.getAndIncrement() / size)).values();

        //Printing partitions. Each partition will be written to a file.
        //Testing confirms the partitioning works correctly.
        partitioned.forEach(System.out::println);

        //Iterate through the Collections and create a file for List<String> object.
        //Testing confirms the file is created and properly named.
        Integer count = 0;
        for (List<String> chunks : partitioned) {
            // Prepare new incremented file name.
            String outputFile = "batched_items_file_";
            String txt = ".txt";
             count++;

            String filename = outputFile + count + txt;

            // Write file to directory.
            fileWriter = new BufferedWriter(new FileWriter("src" + "//" + outputFile));
            fileWriter = new BufferedWriter(new FileWriter(filename));

            //Iterate through the List of Strings and write each String to the file.
            //Writing is not successful. Only 1 file is created and it is empty.
            for (String chunk : chunks) {
                // Prepare list of strings to be written to new file.
                // Write each item number to file.
                fileWriter.write(chunk + System.lineSeparator());
            }

        }

    } catch (IOException e) {
        e.printStackTrace();
    } finally {
        // Closing the resources
        System.out.println("Finished");

        try {
            if (reader != null) {
                reader.close();
            }

            if (fileWriter != null) {
                fileWriter.close();
            }

        } catch (IOException e) {
            e.printStackTrace();
        }
    }
  }
}
Jeremy
  • 326
  • 4
  • 15