Deleting a line from a huge file in Perl

Question

I have huge text file and first five lines of it reads as below :

This is fist line
This is second line
This is third line
This is fourth line
This is fifth line

Now, I want to write something at a random position of the third line of that file which will replace the characters in that line by the new string I am writing. I am able to achieve that with the below code :

use strict;
use warnings;

my @pos = (0);
open my $fh, "+<", "text.txt";

while(<$fh) {
    push @pos, tell($fh);
}

seek $fh , $pos[2]+1, 0;
print $fh "HELLO";

close($fh);

However, I am not able to figure out with the same kind of approach how can I delete the entire third line from that file so that the texts reads below :

This is fist line
This is second line
This is fourth line
This is fifth line

I do not want to read the entire file into an array, neither do I want to use Tie::File. Is it possible to achieve my requirement using seek and tell ? A solution will be very helpful.

Why do you not want to use `Tie::File`? I think it would be ideal for this purpose. — Borodin, Oct 27 '16 at 15:15
@Borodin Even the Tie::File wil read the file into an array, won't that be memory consuming ? Can the -memory option of the module be some help in that case ? — H.Burns, Oct 31 '16 at 09:56

zdim · Accepted Answer · 2018-08-03T07:03:50.857

A file is a sequence of bytes. We can replace (overwrite) some of them, but how would we remove them? Once a file is written its bytes cannot be 'pulled out' of the sequence or 'blanked' in any way. (The ones at the end of the file can be dismissed, by truncating the file as needed.)

The rest of the content has to move 'up', so that what follows the text to be removed overwrites it. We have to rewrite the rest of the file. In practice it is often far simpler to rewrite the whole file.

As a very basic example

use warnings 'all';
use strict;
use File::Copy qw(move);

my $file_in = '...';
my $file_out = '...';  # best use `File::Temp`

open my $fh_in,  '<', $file_in  or die "Can't open $file_in: $!";
open my $fh_out, '>', $file_out or die "Can't open $file_out: $!";

# Remove a line with $pattern
my $pattern = qr/this line goes/;

while (<$fh_in>) 
{
    print $fh_out $_  unless /$pattern/;
}
close $fh_in;
close $fh_out;

# Rename the new fie into the original one, thus replacing it
move ($file_out, $file_in) or die "Can't move $file_out to $file_in: $!";

This writes every line of input file into the output file, unless a line matches a given pattern. Then that file is renamed, replacing the original (what does not involve data copy). See this topic in perlfaq5.

Since we really use a temporary file I'd recommend the core module File::Temp for that.

This may be made more efficient, but far more complicated, by opening in update '+<' mode so to overwrite only a portion of the file. You iterate until the line with the pattern, record (tell) its position and the line length, then copy all remaining lines in memory. Then seek back to the position minus length of that line, and dump the copied rest of the file, overwriting the line and all that follows it.

Note that now the data for the rest of the file is copied twice, albeit one copy is in memory. Going to this trouble may make sense if the line to be removed is far down a very large file. If there are more lines to remove this gets messier.

Writing out a new file and copying it over the original changes the file's inode number. That may be a problem for some tools or procedures, and if it is you can instead update the original by either

Once the new file is written out, open it for reading and open the original for writing. This clobbers the original file. Then read from the new file and write to the original one, thus copying the content back to the same inode. Remove the new file when done.
Open the original file in read-write mode ('+<') to start with. Once the new file is written, seek to the beginning of the original (or to the place from which to overwrite) and write to it the content of the new file. Remember to also set the end-of-file if the new file is shorter,
```
truncate $fh, tell($fh); 
```

after copying is done. This requires some care and the first way is probably generally safer.

If the file weren't huge the new "file" can be "written" in memory, as an array or a string.

my point is can't we overwrite that line with nothing, so that the line ceases to exist and the next line come up automatically? — H.Burns, Oct 26 '16 at 18:05
`This is third line\n` occupies 19 characters. You can only replace it with other 19 characters. — PerlDuck, Oct 26 '16 at 18:07
@H.Burns Right, that's the thing -- there is no 'nothing', it's bytes that are there, so some content. The only way to "remove" it is to move the rest. Imagine a line of little boxes, each with a piece inside -- there has to be _something_ in each. There is no way in the filesystem to magically pluck out a box. The only thing we can do is to move the content of the next box into one we want "removed," etc. The bytes at the end may be discarded. — zdim, Oct 26 '16 at 18:08
@zdim That sounds ok to me, could you please demonstrate a bit of code for moving the content into the one we want to remove and discard the end ? — H.Burns, Oct 26 '16 at 18:14
@H.Burns I added a simple example. It can of course be written more properly, with `if-else` branches etc, if there is a more complicated algorithm for what to move. — zdim, Oct 26 '16 at 18:24
@H.Burns, Think of files as pieces of grid paper. You can't remove a portion of the file any more than you can remove squares from the page. If you want to remove something from the file/page, you need to copy every subsequent byte/square into earlier bytes/squares. The exception is that you can delete from the end of the file by playing with its size. — ikegami, Oct 26 '16 at 18:27
If you always want to remove the third line, then [`$.`](http://perldoc.perl.org/perlvar.html#Variables-related-to-filehandles) (the current line number, 1-based) could help: `print $fh_out $_ unless ($. == 3);`. — PerlDuck, Oct 26 '16 at 18:40
@H.Burns I added a comment on replacing only what is necessary -- I think it's not worthed, unless there is a demonstrated problem with rewriting the whole file. I also added a link to a `perlfaq5` item on this. — zdim, Oct 26 '16 at 19:30

score 0 · Answer 2 · answered Oct 26 '16 at 18:18

0

Use sed command from Linux command line in Perl:

my $return = `sed -i '3d' text.txt`;

Where "3d" means delete the 3rd row.

answered Oct 26 '16 at 18:18

papaiatis

4,231
4
26
38

Why is the downvote? OP asked a method to delete a line from a huge file in perl. It does what he wants. – papaiatis Oct 28 '16 at 08:20
Perhaps because this is not exactly a Perl solution but merely a sed solution. Also, the content of `$return` is useless. It's always empty. (I wasn't the downvoter, btw.) – PerlDuck Oct 28 '16 at 17:52

score -1 · Answer 3 · answered Oct 28 '16 at 19:32

It is useful to look at perlrun and see how perl itself modifies a file 'in-place.'

Given:

$ cat text.txt
This is fist line
This is second line
This is third line
This is fourth line
This is fifth line

You can apparently 'modify in-place', sed like, by using the -i and -p switch to invoke Perl:

$ perl -i -pe 's/This is third line\s*//' text.txt
$ cat text.txt
This is fist line
This is second line
This is fourth line
This is fifth line

But if you consult the Perl Cookbook recipe 7.9 (or look at perlrun) you will see that this:

$ perl -i -pe 's/This is third line\s*//' text.txt

is equivalent to:

while (<>) {
    if ($ARGV ne $oldargv) {           # are we at the next file?
        rename($ARGV, $ARGV . '.bak');
        open(ARGVOUT, ">$ARGV");       # plus error check
        select(ARGVOUT);
        $oldargv = $ARGV;
    }
    s/This is third line\s*//;
}
continue{
    print;
}
select (STDOUT);                      # restore default output

Deleting a line from a huge file in Perl

3 Answers3

Linked