how to delete a large number of lines from a file

Question

I have a file with ~700,000 lines and I would like to remove a bunch of specific lines (~30,000) using bash scripting or another method.

I know I can remove lines using sed:

sed -i.bak -e '1d;34d;45d;678d' myfile.txt # an example

I have the lines in a text file but I don't know if I can use it as input to sed, maybe perl??

Thanks

What's the format of the text file? Massage that data so that it looks like a sed expression...although with 30,000 values you may bump into a limit on the size of the argument to sed. — William Pursell, Nov 04 '14 at 02:04
Look at this post, it is very similar... http://stackoverflow.com/questions/26670650/selecting-a-large-number-of-specific-rows-in-file/26672005#26672005 — Mark Setchell, Nov 04 '14 at 09:34

score 2 · Accepted Answer · answered Nov 04 '14 at 03:41

2

A few options:

sed <(sed 's/$/d/' lines_file) data_file

awk 'NR==FNR {del[$1]; next} !(FNR in del)' lines_file data_file

perl -MPath::Class -e '
  %del = map {$_ => 1} file("lines_file")->slurp(chomp => 1);
  $f = file("data_file")->openr();
  while (<$f>) {
    print unless $del{$.};
  }
'

answered Nov 04 '14 at 03:41

glenn jackman

238,783
38
220
352

Thanks for all the answers but I like the different options!! – user2380782 Nov 04 '14 at 16:47

score 2 · Answer 2 · answered Nov 04 '14 at 03:46

2

perl -ne'
  BEGIN{ local @ARGV =pop; @h{<>} =() }
  exists $h{"$.\n"} or print;
' myfile.txt lines

answered Nov 04 '14 at 03:46

mpapec

50,217
8
67
127

score 1 · Answer 3 · answered Nov 04 '14 at 02:16

You can make the remove the lines using sed file. First make a list of lines to remove. (One line number for one line)

$ cat lines
1
34
45
678

Make this file to sed format.

$ sed -e 's|$| d|' lines >lines.sed
$ cat lines.sed
1 d
34 d
45 d
678 d

Now use this sed file and give it as input to sed command.

$ sed -i.bak -f lines.sed file_with_70k_lines

This will remove the lines.

score 0 · Answer 4 · answered Nov 04 '14 at 02:14

0

If you can create a text file of the format

1d
34d
45d
678d

then you can run something like

sed -i.bak -f scriptfile datafile

answered Nov 04 '14 at 02:14

Dinesh

4,437
5
40
77

score 0 · Answer 5 · answered Nov 04 '14 at 08:11

You can use a genuine editor for that, and ed is the standard editor.

I'm assuming your lines are in a file lines.txt, one number per line, e.g.,

Then (with a blatant bashism):

ed -s file.txt < <(sed -n '/^[[:digit:]]\+$/p' lines.txt | sort -nr | sed 's/$/d/'; printf '%s\n' w q)

A first sed selects only the numbers from file lines.txt (just in case).

There's something quite special to take into account here: that when you delete line 1, then line 34 in the original file becomes line 33. So it's better to remove the lines from the end: start with 678, then 45, etc. that's why we're using sort -nr (to sort the numbers in reverse order). A final sed appends d (ed's delete command) to the numbers.

Then we issue the w (write) and q (quit) commands.

Note that this overwrites the original file!

how to delete a large number of lines from a file

5 Answers5