2

I have got around 50 big text files (~4GB) and I only need to replace one string situated in the first 100 lines of these files. In fact what I would need is an unix command line that look for the first match, replace it in place and break.

I've tried playing with sed but I'm still struggling to get a satisfying result.

JackWhiteIII
  • 1,388
  • 2
  • 11
  • 25
esgn
  • 183
  • 1
  • 3
  • 9
  • Somehowr related: [How to edit 300 GB text file (genomics data)?](http://stackoverflow.com/q/16900721/1983854). – fedorqui Jun 16 '15 at 15:30
  • Also somewhat similar [Edit huge SQL data file](http://stackoverflow.com/questions/30727191). That file's about 23 GiB on a machine with 20 GiB free space. – Jonathan Leffler Jun 16 '15 at 16:10

3 Answers3

7

You can edit up to the first match using sed:

sed -e '1,/pattern/{s/pattern/replace/;}'

On lines 1 to N-1 (where line N contains the pattern), the substitution does nothing; on line N, it does the real work. Thereafter, you're no longer in the 1,/pattern/ range of lines so there is no further transformation.

Note that this doesn't work if line 1 matches the pattern; it then makes changes in line 1 and the next line that matches the pattern. With GNU sed at least, you can change the 1 to 0 and that works OK.

printf "%s\n" pattern pattern pattern pattern |
sed -e '0,/pattern/{s/pattern/replace/;}'

However, the description says "in the first 100 lines" and while line 1 is in the first 100 lines, that isn't the way you'd normally describe it when it appears on line 1.

You can add a -i option to overwrite the original file once you've tested it. Beware: not all versions of sed support -i and on Mac OS X, the backup suffix is mandatory -i.bak (but can be empty: use -i ''). By contrast, GNU sed has an optional suffix which must be attached to the -i option. Hence, -i.bak works with both GNU and Mac (BSD) sed; other uses of the -i option are specific to the variant of sed you're using.

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
2
sed -i '1,100 { :a; N; $! ba; s/input/output/ }' file
  • :a; N; $! ba is appending first 100 lines in pattern space
  • all 100 lines will be treated like one string.
  • Then substitution will take care only on first matched pattern.
  • -i is inplace editing

q can't be used after replacement since it will stop printing the rest of the lines.

Also before executing above sed i'll recommend checking is pattern string inside file and where with

sed -n '/patternstring/{=;p}' file

where = is printing line number (some grep styled sed command)

or if you want to quit imidiatelly after finding first match

sed -n '/patternstring/{=;p;q}' file
josifoski
  • 1,696
  • 1
  • 14
  • 19
  • I just tried this on a 500-line input, and it had no effect whatsoever. Which is odd, because I was expecting a completely different failure mode. – This isn't my real name Jun 16 '15 at 15:46
  • let me check in practice, needs something more + in appending 100 lines – josifoski Jun 16 '15 at 15:46
  • Now it should work, i've added $! which will always be true, so first 100 lines will be appending – josifoski Jun 16 '15 at 15:49
  • No, it's just editing the first hundred lines in place and passing the rest unchanged, you don't need all that fancy. Just `'1,100 s/input/output/'` – This isn't my real name Jun 16 '15 at 15:50
  • it will reflect on all 100 lines if string is more than once present in them, with my solution only on first matched input string – josifoski Jun 16 '15 at 15:51
  • Also, since this is "Unix", not "Linux", it bears mentioning that the "`-i`" option for editing in place is not supported by POSIX, (see [http://pubs.opengroup.org/onlinepubs/9699919799/utilities/sed.html](here)), and that POSIX doesn't allow for labels to be followed by other commands on the same line. – This isn't my real name Jun 16 '15 at 15:53
  • Ah. Missed the only-match-first clause. Still, the initial solution had no effect in my test. – This isn't my real name Jun 16 '15 at 15:54
2

If you want to treat the first occurence without knowing exactly where it is, you could use ed. It is a very old line editor written in the time where memory was scarce. It may be a little less efficient than sed here but both simpler and more robust against the pattent not being exactly where expected.

echo '/input/s/input/output/
wq' | ed file
Serge Ballesta
  • 143,923
  • 11
  • 122
  • 252