9

I have a big file 150GB CSV file and I would like to remove the first 17 lines and the last 8 lines. I have tried the following but seems that's not working right

sed -i -n -e :a -e '1,8!{P;N;D;};N;ba' 

and

sed -i '1,17d' 

I wonder if someone can help with sed or awk, one liner will be great?

fedorqui
  • 275,237
  • 103
  • 548
  • 598
Deano
  • 11,582
  • 18
  • 69
  • 119
  • I noticed the size is 150GB, how much free space do you still have on your disk? greater than 150GB? Is file in-place change necessary? – Kent Feb 07 '13 at 13:43
  • not much, another 100GB or so – Deano Feb 07 '13 at 13:44
  • I tried sed -i -n -e :a -e '1,8!{P;N;D;};N;ba' and sed -i '1,17d' but it doesn't seem that its working right. – Deano Feb 07 '13 at 13:45
  • @user1007727 then all inter-media temp file solutions won't work for you. – Kent Feb 07 '13 at 13:48
  • If you have less memory available than the size of your file then you need to do this in chunks that are smaller than the memory available, removing sections of your original as you write them to your new file. Even in-place editors like "ed" need to buffer the contents of your file to operate on it. – Ed Morton Feb 07 '13 at 13:51
  • @user1007727 I suggest you writing this requirement in your question, that something like the target file is 150G, but you don't have 150Gb free space, how to edit that file. – Kent Feb 07 '13 at 13:55
  • Any chance of not putting the first 17 lines and the last eight on the file in the first place? What happens to the file afterwards? Can the data be ignored whilst carrying out some other task on the file? – Bill Woodger Feb 07 '13 at 23:18
  • possible duplicate of [How to delete first two lines and last four lines from a text file with bash?](http://stackoverflow.com/questions/10460919/how-to-delete-first-two-lines-and-last-four-lines-from-a-text-file-with-bash) The other is tool agnostic, and top answers here are not sed. – Ciro Santilli OurBigBook.com Oct 16 '14 at 09:24

7 Answers7

18

head and tail are better for the job than sed or awk.

tail -n+18 file | head -n-8 > newfile
choroba
  • 231,213
  • 25
  • 204
  • 289
10
awk -v nr="$(wc -l < file)" 'NR>17 && NR<(nr-8)' file
Ed Morton
  • 188,023
  • 17
  • 78
  • 185
2

All awk:

awk 'NR>y+x{print A[NR%y]} {A[NR%y]=$0}' x=17 y=8 file
Scrutinizer
  • 9,608
  • 1
  • 21
  • 22
1
Try this :

sed '{[/]<n>|<string>|<regex>[/]}d' <fileName>       
sed '{[/]<adr1>[,<adr2>][/]d' <fileName>

where

  1. /.../=delimiters

  2. n = line number

  3. string = string found in in line

  4. regex = regular expression corresponding to the searched pattern

  5. addr = address of a line (number or pattern )

  6. d = delete

Refer this link

0
LENGTH=`wc -l < file`
head -n $((LENGTH-8)) file | tail -n $((LENGTH-17)) > file

Edit: As mtk posted in comment this won't work. If you want to use wc and track file length you should use:

LENGTH=`wc -l < file`
head -n $((LENGTH-8)) file | tail -n $((LENGTH-8-17)) > file

or:

LENGTH=`wc -l < file`
head -n $((LENGTH-8)) file > file
LENGTH=`wc -l < file`
tail -n $((LENGTH-17)) file > file

What makes this solution less elegant than that posted by choroba :)

chepner
  • 497,756
  • 71
  • 530
  • 681
Adam Sznajder
  • 9,108
  • 4
  • 39
  • 60
  • 1
    This seems to be errorneous, as the `tail` will operate on the output of `head`, resulting in wrong offset of rows being counted. – mtk Feb 07 '13 at 13:39
0

I learnt this today for the shell.

{
  ghead -17  > /dev/null
  sed -n -e :a -e '1,8!{P;N;D;};N;ba'
} < my-bigfile > subset-of

One has to use a non consuming head, hence the use of ghead from the GNU coreutils.

sotapme
  • 4,695
  • 2
  • 19
  • 20
0

Similar to Thor's answer, but a bit shorter:

sed -i '' -e $'1,17d;:a\nN;19,25ba\nP;D' file.txt

The -i '' tells sed to edit the file in place. (The syntax may be a bit different on your system. Check the man page.)

If you want to delete front lines from the front and tail from the end, you'd have to use the following numbers:

1,{front}d;:a\nN;{front+2},{front+tail}ba\nP;D

(I put them in curly braces here, but that's just pseudocode. You'll have to replace them by the actual numbers. Also, it should work with {front+1}, but it doesn't on my machine (macOS 10.12.4). I think that's a bug.)

I'll try to explain how the command works. Here's a human-readable version:

1,17d     # delete lines 1 ... 17, goto start
:a        # define label a
N         # add next line from file to buffer, quit if at end of file
19,25ba   # if line number is 19 ... 25, goto start (label a)
P         # print first line in buffer
D         # delete first line from buffer, go back to start

First we skip 17 lines. That's easy. The rest is tricky, but basically we keep a buffer of eight lines. We only start printing lines when the buffer is full, but we stop printing when we reach the end of the file, so at the end, there are still eight lines left in the buffer that we didn't print - in other words, we deleted them.