Re-index two digit strings based on occurrence of a common string

Question

I have a urlwatch .yaml file that has this format:

name: 01_urlwatch update released
url: "https://github.com/thp/urlwatch/releases"
filter:
  - xpath:
      path: '(//div[contains(@class,"release-timeline-tags")]//h4)[1]/a'
  - html2text: re
---
name: 02_urlwatch webpage
url: "https://thp.io/2008/urlwatch/"
filter: 
  - html2text: re
  - grep: (?i)current\sversion  #\s Matches a whitespace character
  - strip # Strip leading and trailing whitespace 
---
name: 04_RansomWhere? Objective-See
url: "https://objective-see.com/products/ransomwhere.html"
filter:
  - html2text: re
  - grep: (?i)current\sversion #\s Matches a whitespace character
  - strip #Strip leading and trailing whitespace
---
name: 05_BlockBLock Objective-See
url: "https://objective-see.com/products/blockblock.html"
filter:
  - html2text: re
  - grep: (?i)current\sversion #(?i) \s 
  - strip #Strip leading and trailing whitespace
---

I need to "re-index" the two digit number depending on the occurrence of name: . In this example the first and second occurrence of name: are followed by the correct index numbers but the third and fourth are not.

In the example above the third and fourth occurrence of name: would have their index number re-indexed to have 03_ and 04_ before the text string. That is: a two digit index number, and an underscore.

Also, there are instances of this string #name: which should not be counted in the re-indexing. (They have been commented out so those lines are not acted upon by urlwatch)

I tried using sed but had trouble with generating an index number based on occurrence of the string. I don't have GNU sed but can install if that is the only method.

Do not use `sed`. I believe, this could be done in an awk script - match the expression, replace with `gsub` with a new number, output. — KamilCuk, Oct 15 '20 at 21:38
If you are on macOS, install GNU sed and GNU awk anyway, or else you lose a lot of features using their broken defaults (confirmed by SO mac users). `sed` is not suitable for this job also, as Kamil already said. — thanasisp, Oct 15 '20 at 22:51

Enlico · Answer 1 · 2022-11-04T19:12:20.590

3

I think this could be ok:

awk '/^name: / { sub(/[0-9]{2}/, ++i); sub(/ [1-9][^0-9]/,"\x0&"); sub(/\x0 /," 0") }; 1' your_input

On every line starting with name: , we substitute the double digit ([0-9]{2}) with a number i after incrementing it (it starts from undefined, i.e. from 0, so the first time we increment it we get 1); with another substitution we mark the line if if there's a one digit number only, and with a third substitution we add a leading 0 and remove the mark.

Probably it's a bit fragile, but given your explanation, it looks fine.

edited Nov 04 '22 at 19:12

answered Oct 15 '20 at 22:02

Enlico

23,259
6
48
102

2

I just need a two digit index number. That's what I asked. The commands you wrote skips to a three digit number after 10 and then between every additional ten: 01, 02, 03, 04, 05, 06, 07, 08, 09, 10, 101, 102, 103, 104, 105, 106, 107, 108, 109, 20, 201, 202, 203, 204, 205, 206, 207, 208, 209, 30,.... Thanks for your help. – John Oct 16 '20 at 16:01
@John, I think it's fixed now, fwiw. – Enlico Oct 16 '20 at 16:22
I did have a couple of upvotes on this question at one point. I guess someone came along and downvoted me. I don't think it's simple for a novice CLI user. – John Oct 16 '20 at 17:04
Downvotes come everywhere, even on the best questions. However, does my fix work now? – Enlico Oct 16 '20 at 18:15

score 3 · Answer 2 · answered Oct 16 '20 at 11:49

This might work for you (GNU sed):

sed -E '/^name:/{x;s/.*/expr & + 1/e;s/^.$/0&/;x;G;s/[0-9]+(.*)\n(.*)/\2\1/}' file

Match on a line beginning name:, increment a counter in the hold space, append the hold space to the pattern space, match on first set of integers and using captured groups substitute the counter.

score 2 · Accepted Answer · edited Nov 04 '20 at 09:59

2

awk '/^name/{sub(/[0-9]{2}/,sprintf("%02d", ++c))}1' file

For any line starting with "name" we replace the first 2-digit number with our counter, which increments on every occurrence, with the help of the GNU awk sprintf function to print it with leading zeros when needed.

edited Nov 04 '20 at 09:59

marc_s

732,580
175
1,330
1,459

answered Oct 15 '20 at 22:15

thanasisp

5,855
3
14
31

1

Sorry, that was my mistake. It does exactly what it's supposed to do! – John Oct 15 '20 at 22:58
1

Nice. Also in the future consider to always use GNU awk and GNU sed, in case you don't have them both for your OS, I guess you can install them. – thanasisp Oct 15 '20 at 23:05
1

Thanks for this. Far more advanced and intricate than I could handle. I will install with homebrew... `brew install gnu-sed; brew install gnu-awk` – John Oct 15 '20 at 23:10

Re-index two digit strings based on occurrence of a common string

3 Answers3