-1

I have a large file in which I need to remove only a number of lines Is there any way to do this without opening a new file and copy the whole text?

edit: the main problem is when it runs in more then one thread with large txt filse the program fails

JohnnyF
  • 1,053
  • 4
  • 16
  • 31

3 Answers3

2

Is there any way to do this without opening a new file and copy the whole text?

No there isn't. Certainly, there isn't if you want to do it safely.

And RandomAccessFile won't really help you either. It would allow you to replace a sequence of bytes in the file with an equal number of bytes, but that doesn't amount to deleting a line.

You could use a RAF like this:

Given an initial state L1L2L3...LN replace L2L3...LN with L3...LN

or you could use the RAF to "slide" the lines one at a time as per @halfbit's answer.

However:

  • In the worst case you are copying the entire file content, and the average case involves reading and writing the bytes of O(N) lines.

  • The simple way of doing this requires holding O(N) lines in memory.

  • The "sliding" approach requires O(N) I/O operations (i.e. system calls).

  • Most importantly: line deletion by in-place file update is risky. If the application is interrupted in the middle of the process (e.g. power failure), then you will end up with a corrupted file.

FWIW: this is not a limitation in Java per se. Rather it is a limitation of the way that modern operating systems represent / model files.

Stephen C
  • 698,415
  • 94
  • 811
  • 1,216
0

Have a look at Random Access Files so that you can position your file pointer on the desired location and move the text .

joey rohan
  • 3,505
  • 5
  • 33
  • 70
0

Here is some standalone example code using RandomAccessFile to remove lines without opening a new file, which seems to work for me. (In-place copying is required though.)

public static void main(String[] args) {
    try {
        // prepare test file
        String path = "/tmp/test.txt";
        writeTestLines(path, 999999);

        // mode "rws": read + write synchronous
        RandomAccessFile raf = new RandomAccessFile(path, "rws");

        int bufSize = 1 << 20; // 1 MiB
        Scanner s = new Scanner(new BufferedInputStream(new FileInputStream(raf.getFD()), bufSize));
        PrintWriter pw = new PrintWriter(new BufferedOutputStream(new FileOutputStream(raf.getFD()), bufSize));
        long writeOffset = 0;
        for (int nr = 1;; nr++) {
            if (!s.hasNextLine())
                break;
            String line = s.nextLine();
            if (nr != 2 && !line.contains("00")) {
                // switch to writing: save read offset, seek write offset
                long readOffset = raf.getFilePointer();
                raf.seek(writeOffset);
                pw.println(line);
                // switch to reading: save write offset, seek read offset
                writeOffset = raf.getFilePointer();
                raf.seek(readOffset);
            }
        }

        // write buffered output and truncate file
        raf.seek(writeOffset);
        pw.flush();
        raf.setLength(raf.getFilePointer());

        pw.close();
        s.close();
        raf.close();
    } catch (Exception ex) {
        ex.printStackTrace(System.err);
    }
}

public static void writeTestLines(String path, int n) throws IOException {
    PrintWriter pw = new PrintWriter(path);
    for (int i = 1; i <= n; i++) pw.println("line " + i);
    pw.close();
}

Note that this code assumes that line endings read by the Scanner are the same as produced by PrintWriter (e.g. not just a single LineFeed on Windows).

Note that above code could be optimized to not rewrite any unchanged file head - e.g. by just tracking the write offset first and then switching to a "normal" PrintWriter.

halfbit
  • 3,414
  • 1
  • 20
  • 26