-2

I need to insert lines from one file into another file starting at a pattern (pattern_string) using awk. I don't need solutions with sed.

inputfiles: file1.txt and file2.txt
outputfile: mergedfile.txt

Example files:

file1.txt

1
2
pattern_string
7
8
9

file2.txt

3
4
5
6

expected merged file mergedfile.txt

1
2
pattern_string
3
4
5
6
7
8
9
fedorqui
  • 275,237
  • 103
  • 548
  • 598

3 Answers3

3

Might be better to use sed for this. Using /r you can read a file when you match pattern_string:

sed "/pattern_string/r fle2.txt" file1.txt

Which returns:

1
2
pattern_string
3
4
5
6
7
8
9
fedorqui
  • 275,237
  • 103
  • 548
  • 598
  • Thanks for your prompt reply. Not sure, but for very big files sed throws "couldn't write items to stdout: no space left on the device problem". I cannot add space to /tmp; i need to switch to awk – user3640480 May 15 '14 at 11:21
  • This is not a problem of the tool you are using, but of space: if you are at 100% of space, nothing can be done. Try testing in another filesystem. And in case, do use `sed -i` to do in-place editing, so `file1.txt` will be updated with the new content. – fedorqui May 15 '14 at 11:22
  • FYI `sed -i` doesn't actually do in-place editing, it just doesn't require you to specify the tmp file name manually but it does still use one behind the scenes so it wouldn't help if you're running out of space. – Ed Morton May 15 '14 at 14:51
  • 1
    Uhms, good to know @EdMorton . I had never thought about that, but it makes perfect sense. – fedorqui May 15 '14 at 14:58
  • 1
    Yeah, unfortunately there's no simple way to really do inplace editing but see http://stackoverflow.com/a/17331179/1745001 for how you can do it if necessary. – Ed Morton May 15 '14 at 15:01
1

Edited the first solution to avoid a problem pointed by Ed Morton

awk 'FNR==NR {a[i++]=$0;next} /pattern_string/ {print; for(i=0;i in a;i++) print a[i];next}1' file2 file1

Output:

1
2
pattern_string
3
4
5
6
7
8
9

Reads file2 into an array, then starts printing file1 until the pattern is matched, then prints content of the array, and continues to print file1.


Or you can use:
awk 'BEGIN {
            while((getline<"file1") == 1) {

              print;

              if($0 ~ /pattern_string/) {
                while((getline<"file2") == 1) print;
                close("file2");
              }
            }
            close("file1");
          }'

Which outputs the same, but doesn't use an array to store file2

a5hk
  • 7,532
  • 3
  • 26
  • 40
  • 1
    `for(i in a)` can print the lines from `a` in any order, not necessarily the order they're read in. You want `for(i=1;i in a;i++)` instead. Do not use that `getline` solution it is wrong in several ways. Make sure to read and fully understand http://awk.info/?tip/getline if you're considering using `getline`. – Ed Morton May 15 '14 at 15:12
  • @Ed Morton, Thanks, I updated the first solution. I read the link you provided and updated the `getline` version a bit. Honestly, I am still not sure what was/is wrong. I would appreciate it if you can tell me what is/was wrong, beside the changes I have made. – a5hk May 15 '14 at 16:42
  • Honestly, I just can't be bothered. I wrote that article with the co-operation of every awk expert I've heard of so none of us would need to keep explaining the issues and more importantly to show the right ways to use getline but STILL I end up having to pick apart every use of getline as the users never believe there's an issue (e.g. see http://stackoverflow.com/a/23622869/1745001 which took me about 10 comments and an updated answer with examples to convince the author). So, if you're happy with it, go for it but at LEAST ask yourself what each loop is doing for you. – Ed Morton May 15 '14 at 22:03
  • @EdMorton it looks like the think you posted (http://awk.info/?tip/getline) no longer exists. Do you know if that content has moved elsewhere? – larsks Aug 04 '17 at 16:06
  • Yes, that site folded but the same article is at http://awk.freeshell.org/AllAboutGetline – Ed Morton Aug 04 '17 at 17:13
0

In case the included file can also have lines to be replaced by yet other files, this script will not only expand all the lines that say "include subfile", but by writing the result to a tmp file, resetting ARGV[1] (the highest level input file) and not resetting ARGV[2] (the tmp file), it then lets awk do any normal record parsing on the result of the expansion since that's now stored in the tmp file. If you don't need that, just do the "print" to stdout and remove any other references to a tmp file or ARGV[2].

  awk '
  function read(file,    rec, arr) {
       while ( (getline rec < file) > 0) {
           split(rec,arr)
           if (arr[1] == "include") {
                read(arr[2])
           } else {
                print rec > ARGV[2]
           }
       }
       close(file)
   }
   BEGIN{
      read(ARGV[1])
      ARGV[1]=""
      close(ARGV[2])
   }
   1
   ' file tmp
Ed Morton
  • 188,023
  • 17
  • 78
  • 185