2

I have a urlwatch .yaml file that has this format:

name: 01_urlwatch update released
url: "https://github.com/thp/urlwatch/releases"
filter:
  - xpath:
      path: '(//div[contains(@class,"release-timeline-tags")]//h4)[1]/a'
  - html2text: re
---
name: 02_urlwatch webpage
url: "https://thp.io/2008/urlwatch/"
filter: 
  - html2text: re
  - grep: (?i)current\sversion  #\s Matches a whitespace character
  - strip # Strip leading and trailing whitespace 
---
name: 04_RansomWhere? Objective-See
url: "https://objective-see.com/products/ransomwhere.html"
filter:
  - html2text: re
  - grep: (?i)current\sversion #\s Matches a whitespace character
  - strip #Strip leading and trailing whitespace
---
name: 05_BlockBLock Objective-See
url: "https://objective-see.com/products/blockblock.html"
filter:
  - html2text: re
  - grep: (?i)current\sversion #(?i) \s 
  - strip #Strip leading and trailing whitespace
---

I need to "re-index" the two digit number depending on the occurrence of name: . In this example the first and second occurrence of name: are followed by the correct index numbers but the third and fourth are not.

In the example above the third and fourth occurrence of name: would have their index number re-indexed to have 03_ and 04_ before the text string. That is: a two digit index number, and an underscore.

Also, there are instances of this string #name: which should not be counted in the re-indexing. (They have been commented out so those lines are not acted upon by urlwatch)

I tried using sed but had trouble with generating an index number based on occurrence of the string. I don't have GNU sed but can install if that is the only method.

John
  • 469
  • 1
  • 4
  • 17
  • 4
    Do not use `sed`. I believe, this could be done in an awk script - match the expression, replace with `gsub` with a new number, output. – KamilCuk Oct 15 '20 at 21:38
  • 1
    If you are on macOS, install GNU sed and GNU awk anyway, or else you lose a lot of features using their broken defaults (confirmed by SO mac users). `sed` is not suitable for this job also, as Kamil already said. – thanasisp Oct 15 '20 at 22:51

3 Answers3

3

I think this could be ok:

awk '/^name: / { sub(/[0-9]{2}/, ++i); sub(/ [1-9][^0-9]/,"\x0&"); sub(/\x0 /," 0") }; 1' your_input

On every line starting with name: , we substitute the double digit ([0-9]{2}) with a number i after incrementing it (it starts from undefined, i.e. from 0, so the first time we increment it we get 1); with another substitution we mark the line if if there's a one digit number only, and with a third substitution we add a leading 0 and remove the mark.

Probably it's a bit fragile, but given your explanation, it looks fine.

Enlico
  • 23,259
  • 6
  • 48
  • 102
  • 2
    I just need a two digit index number. That's what I asked. The commands you wrote skips to a three digit number after 10 and then between every additional ten: 01, 02, 03, 04, 05, 06, 07, 08, 09, 10, 101, 102, 103, 104, 105, 106, 107, 108, 109, 20, 201, 202, 203, 204, 205, 206, 207, 208, 209, 30,.... Thanks for your help. – John Oct 16 '20 at 16:01
  • @John, I think it's fixed now, fwiw. – Enlico Oct 16 '20 at 16:22
  • I did have a couple of upvotes on this question at one point. I guess someone came along and downvoted me. I don't think it's simple for a novice CLI user. – John Oct 16 '20 at 17:04
  • Downvotes come everywhere, even on the best questions. However, does my fix work now? – Enlico Oct 16 '20 at 18:15
3

This might work for you (GNU sed):

sed -E '/^name:/{x;s/.*/expr & + 1/e;s/^.$/0&/;x;G;s/[0-9]+(.*)\n(.*)/\2\1/}' file

Match on a line beginning name:, increment a counter in the hold space, append the hold space to the pattern space, match on first set of integers and using captured groups substitute the counter.

potong
  • 55,640
  • 6
  • 51
  • 83
2
awk '/^name/{sub(/[0-9]{2}/,sprintf("%02d", ++c))}1' file

For any line starting with "name" we replace the first 2-digit number with our counter, which increments on every occurrence, with the help of the GNU awk sprintf function to print it with leading zeros when needed.

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
thanasisp
  • 5,855
  • 3
  • 14
  • 31
  • 1
    Sorry, that was my mistake. It does exactly what it's supposed to do! – John Oct 15 '20 at 22:58
  • 1
    Nice. Also in the future consider to always use GNU awk and GNU sed, in case you don't have them both for your OS, I guess you can install them. – thanasisp Oct 15 '20 at 23:05
  • 1
    Thanks for this. Far more advanced and intricate than I could handle. I will install with homebrew... `brew install gnu-sed; brew install gnu-awk` – John Oct 15 '20 at 23:10