27

Consider a text file with scientific data, e.g.:

5.787037037037037063e-02 2.048402977658663748e-01
1.157407407407407413e-01 4.021264347118673754e-01
1.736111111111111049e-01 5.782032163406526371e-01

How can I easily delete, for instance, every second line, or every 9 out of 10 lines in the file? Is it for example possible with a bash script?

Background: the file is very large but I need much less data to plot. Note that I am using Ubuntu/Linux.

Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
Ingo
  • 1,732
  • 10
  • 26
  • 34
  • Are you sure you want to point-sample the data like this? It may be better to do a downsampling, where the data from groups of N lines is averaged in some appropriate way. Point sampling potentially leads to aliasing issues. – Kaz Mar 27 '12 at 20:07
  • http://unix.stackexchange.com/questions/168004/delete-every-nth-line-in-shell – Ciro Santilli OurBigBook.com Jul 12 '15 at 10:32

6 Answers6

63

This is easy to accomplish with awk.

Remove every other line:

awk 'NR % 2 == 0' file > newfile

Remove every 10th line:

awk 'NR % 10 != 0' file > newfile

The NR variable in awk is the line number. Anything outside of { } in awk is a conditional, and the default action is to print.

jordanm
  • 33,009
  • 7
  • 61
  • 76
  • Never heard of awk before. Will definitely check it out now! Thanks! – Ingo Mar 27 '12 at 18:52
  • Awk is very nice for processing text in shell scripts. It can also do floating point math, which bash can not do. Definitely worth the time to learn for shell coders. – jordanm Mar 27 '12 at 18:56
  • 1
    First command leaves lines with even ids in place, it doesn't remove it. If you want to remove it, use awk 'NR % 2 != 0' file > newfile. – Olga Dec 18 '13 at 17:18
  • What about removing columns? – Mihai Bujanca Apr 22 '16 at 11:55
  • 1
    First can be rewritten as `!(NR % 2)` and second as just `NR % 10` – 123 Dec 01 '17 at 09:02
6

How about perl?

perl -n -e '$.%10==0&&print'       # print every 10th line
sorpigal
  • 25,504
  • 8
  • 57
  • 75
  • He wants to delete every 10th line, rather than keep every 10th line. Easy change to your code, != instead of ==. – jordanm Mar 27 '12 at 18:52
  • 2
    No. He states "How can I easily **delete**, for instance, every second line, or **every 9 out of 10** lines in the file?", deleting every 9 out of 10 lines means printing every 10th. As you say, once the solution is posted it's easy to adapt so I have not bothered to correct other poster's who made the same error. – sorpigal Mar 27 '12 at 18:55
  • After rereading the question again, I believe your interpretation is the correct one. – jordanm Mar 27 '12 at 18:58
  • Yes; deleting every 10th line wouldn't give much of a reduction in the data to be plotted. The aim seems to be to do a point sampling of some large data set. – Kaz Mar 27 '12 at 20:03
4

You could possibly do it with sed, e.g.

sed -n -e 'p;N;d;' file # print every other line, starting with line 1

If you have GNU sed it's pretty easy

sed -n -e '0~10p' file # print every 10th line
sed -n -e '1~2p' file # print every other line starting with line 1
sed -n -e '0~2p' file # print every other line starting with line 2
sorpigal
  • 25,504
  • 8
  • 57
  • 75
2

Try something like:

awk 'NR%3==0{print $0}' file

This will print one line in three. Or:

awk 'NR%10<9{print $0}' file 

will print 9 lines out of ten.

Mat
  • 202,337
  • 40
  • 393
  • 406
  • 1
    Print is the default action, so `print $0` is not needed. – jordanm Mar 27 '12 at 18:08
  • I know. Looks too strange to me though. (I'm not an experienced awk user.) – Mat Mar 27 '12 at 18:11
  • @123: but it could be 9. – Mat Dec 01 '17 at 09:05
  • @Mat Yep misread since you put `will print 9 lines out of ten.` so i thought the intention was to print every 9 lines out of ten ( which I thought you'd done a redundant version of just NR%10) , whereas it actually removes the ninth line out of every 10. – 123 Dec 01 '17 at 09:09
2

This might work for you (GNU sed):

seq 10 | sed '0~2d' # delete every 2nd line
1
3
5
7
9
seq 100 | sed '0~10!d' # delete 9 out of 10 lines
10
20
30
40
50
60
70
80
90
100
potong
  • 55,640
  • 6
  • 51
  • 83
0

You can use a awk and a shell script. Awk can be difficult but...

This will delete specific lines you tell it to:

nawk -f awkfile.awk [filename]

awkfile.awk contents

BEGIN {
if (!lines) lines="3 4 7 8"
n=split(lines, lA, FS)
for(i=1;i<=n;i++)
 linesA[lA[i]]
}
!(FNR in linesA)

Also I can't remember if VIM comes with the standard Ubuntu or not. If not get it.

Then open the file with vim vim [filename]

Then type

:%!awk NR\%2 or :%!awk NR\%2 

This will delete every other line. Just change the 2 to another integer for a different frequency.

broguyman
  • 1,386
  • 4
  • 19
  • 36