1

I have hundreds of html files and each of these files has the term "Issue 1" given in it, at two different places in the same file. My goal here is to increment the issue number across the files.

For example: The files are A1, A2, A3... etc. The file A1 has "Issue 1" given at two places in the file. This file does not need to be changed.

The file A2 again has "Issue 1" given at two places in the file. This file needs to be changed so that "Issue 1" is automatically incremented to "Issue 2" at both places, and so on for all the files.

How to do this in Gawk script? I love Notepad++, but I am not a technical (programmer) user. I need a relatively simple way to perform this task that is easily repeatable and timely.

James Z
  • 12,209
  • 10
  • 24
  • 44
Mel
  • 21
  • 2
  • Is the number really present in the file name (as in `A1, A2,..`) or is the numbered file name only an example? You said increment instead of derive from filename. – Lars Fischer May 09 '17 at 12:28
  • The actual file names are pa501.html, pa502.html, pa503.html..... – Mel May 09 '17 at 12:37
  • I do not wish to change the filenames in anyway. I only wish to edit the content in the files. – Mel May 09 '17 at 12:43
  • The `gawk` tag is visited by far fewer people far less often than the `awk` tag so to get more people looking at your question with a view to helping you, tag your awk questions with `awk`, optionally add a `gawk` tag if you like, and state in your question that you're using gawk. – Ed Morton May 18 '17 at 19:14

2 Answers2

1

All you need in the awk script is:

@load "inplace"
{ sub(/\<Issue 1\>/,"Issue " ARGIND); print}

Then just call it as awk -f script A* or whatever the syntax is on Windows. Get a current version of gawk first, though - your version is more than 5 years out of date and missing a TON of useful functionality. Also seriously consider getting cygwin and running gawk and all other UNIX tools from there - it'll make your life vastly easier!

Ed Morton
  • 188,023
  • 17
  • 78
  • 185
0

Here is a GNU awk script that might help you.

script.awk

FNR==1    { fileCount++
            outFile = FILENAME ".mod"
          }
          { $0 = gensub(/Issue 1([^0-9]|$)/, "Issue " fileCount "\\1", "g")
            print > outFile
          }
  1. make a backup copy of your original files
  2. run the script like this: awk -f script.awk pa*.html
  3. make sure that the generated pa*.html.mod files are the correct result
    1. only then: rename the mod files back to the orignal filenames
Lars Fischer
  • 9,135
  • 3
  • 26
  • 35
  • The script is generating an error. It says: awk: script.awk:5: fatal: expression for '>' redirection has null string value – Mel May 09 '17 at 13:24
  • Did you redefine `outFile`? what does `print outFile` say if added as last line in the `BEGINFILE` action and additionally: what does it say, when added before the `print > outFile`? – Lars Fischer May 09 '17 at 13:37
  • I do not understand what you are trying to say since i do not know awk programming. Two changes that i have made are: 1. I have added script.awk in "C:\Program Files\GnuWin32\bin" directory. 2. I made a directory called "fileDir" in c:\ and copies my files to this directory. 3. I then opened command prompt and changed to the directory: "C:\Program Files\GnuWin32\bin". I ran your given command from here. – Mel May 09 '17 at 13:43
  • I would put the script into the same dir as the html file and ran the command from there. I don't know what happens in your system. I assume that it has something to do with the paths to the html files and the current directory. Put everything in the same directory and change into that directory and run the script from that directory. – Lars Fischer May 09 '17 at 13:51
  • I put all my files and script.awk in the bin directory but the same error is showing up. By the way a procedure given in another link executed properly, but of-course, it does not apply to me as my situation is different. https://notepad-plus-plus.org/community/topic/12266/find-replace-with-increment-across-multiple-files – Mel May 09 '17 at 14:01
  • It does more or less the same, if you feel better you can replace the last print statement in my script with `print > (FILENAME ".mod")`, but I still believe your problem has something to do with the paths. You might try to give the script only one file pa501.html instead of `pa*.html` and see what happens then. – Lars Fischer May 09 '17 at 14:14
  • As a debugging help: please consider adding `print outFile` as a fith line in the script. Thus you can see what the content of `outFile` is and if that makes any sense. At the moment it seems that the variable `outFile` is empty. – Lars Fischer May 09 '17 at 14:17
  • BTW: my version of gawk is 4.1.3, what is your version? It seems (from https://groups.google.com/forum/#!topic/comp.lang.awk/5hhznOqIKdI) that BEGINFILE was added sometime around version 3.1 to 3.2 . Thus I modified the script to a different condition. – Lars Fischer May 09 '17 at 14:22
  • I have implemented your 3rd last comment. The script is now executing properly but the issue number vanishes altogether. My Gawk version is 3.1.6. – Mel May 09 '17 at 14:42
  • That is expected when the `BEGINFILE` action is never executed. Try the last version of the script, with the `FNR==1` instead of `BEGINFILE`. – Lars Fischer May 09 '17 at 15:03
  • That worked perfectly. Suppose in future i want to directly overwrite the html files without creating mod files, what are the changes to be made? – Mel May 09 '17 at 15:11
  • You could try https://www.gnu.org/software/gawk/manual/html_node/Extension-Sample-Inplace.html but that seems to be based on `BEGINFILE` or see http://stackoverflow.com/a/16531920/4086774 , short version: it seems you need a more recent gawk version for `directly overwriting` the files. – Lars Fischer May 09 '17 at 15:22
  • OK thanks a lot Lars. Sorry for the inconvenience but since i am new to this community, can you please let me know how to assign points. – Mel May 09 '17 at 15:29
  • 1
    `{print gensub(/(Issue )1\>/, "\\1" ARGIND, "g") > (FILENAME ".mod")}` is all you need to implement the functionality of that script. – Ed Morton May 10 '17 at 04:48