1

I have a file with ~700,000 lines and I would like to remove a bunch of specific lines (~30,000) using bash scripting or another method.

I know I can remove lines using sed:

sed -i.bak -e '1d;34d;45d;678d' myfile.txt # an example

I have the lines in a text file but I don't know if I can use it as input to sed, maybe perl??

Thanks

user2380782
  • 1,542
  • 4
  • 22
  • 60
  • 1
    What's the format of the text file? Massage that data so that it looks like a sed expression...although with 30,000 values you may bump into a limit on the size of the argument to sed. – William Pursell Nov 04 '14 at 02:04
  • Are your files sorted, or can they be sorted? – Robby Cornelissen Nov 04 '14 at 02:04
  • Look at this post, it is very similar... http://stackoverflow.com/questions/26670650/selecting-a-large-number-of-specific-rows-in-file/26672005#26672005 – Mark Setchell Nov 04 '14 at 09:34

5 Answers5

2

A few options:

sed <(sed 's/$/d/' lines_file) data_file
awk 'NR==FNR {del[$1]; next} !(FNR in del)' lines_file data_file
perl -MPath::Class -e '
  %del = map {$_ => 1} file("lines_file")->slurp(chomp => 1);
  $f = file("data_file")->openr();
  while (<$f>) {
    print unless $del{$.};
  }
'
glenn jackman
  • 238,783
  • 38
  • 220
  • 352
2
perl -ne'
  BEGIN{ local @ARGV =pop; @h{<>} =() }
  exists $h{"$.\n"} or print;
' myfile.txt lines
mpapec
  • 50,217
  • 8
  • 67
  • 127
1

You can make the remove the lines using sed file. First make a list of lines to remove. (One line number for one line)

$ cat lines
1
34
45
678

Make this file to sed format.

$ sed -e 's|$| d|' lines >lines.sed
$ cat lines.sed
1 d
34 d
45 d
678 d

Now use this sed file and give it as input to sed command.

$ sed -i.bak -f lines.sed file_with_70k_lines

This will remove the lines.

Sriharsha Kalluru
  • 1,743
  • 3
  • 21
  • 27
0

If you can create a text file of the format

1d
34d
45d
678d

then you can run something like

sed -i.bak -f scriptfile datafile
Dinesh
  • 4,437
  • 5
  • 40
  • 77
0

You can use a genuine editor for that, and ed is the standard editor.

I'm assuming your lines are in a file lines.txt, one number per line, e.g.,

1
34
45
678

Then (with a blatant bashism):

ed -s file.txt < <(sed -n '/^[[:digit:]]\+$/p' lines.txt | sort -nr | sed 's/$/d/'; printf '%s\n' w q)

A first sed selects only the numbers from file lines.txt (just in case).

There's something quite special to take into account here: that when you delete line 1, then line 34 in the original file becomes line 33. So it's better to remove the lines from the end: start with 678, then 45, etc. that's why we're using sort -nr (to sort the numbers in reverse order). A final sed appends d (ed's delete command) to the numbers.

Then we issue the w (write) and q (quit) commands.

Note that this overwrites the original file!

gniourf_gniourf
  • 44,650
  • 9
  • 93
  • 104