Put Numbering before Every Matched String Pattern

Question

I need to replace this:

01:05:01:11 --> 01:05:04:07,so you may continue to support us,|bring us health,
$Italic = True
01:05:04:15 --> 01:05:07:09,well-being,
$Italic = False
01:05:07:21 --> 01:05:13:01,and help us to be one big family|and continue working as a team.

To become essentially this:

1
01:05:01:11 --> 01:05:04:07,so you may continue to support us,|bring us health,
$Italic = True
2
01:05:04:15 --> 01:05:07:09,well-being,
$Italic = False
3
01:05:07:21 --> 01:05:13:01,and help us to be one big family|and continue working as a team.

EDIT_1: Which means that I need to match:

' --> '

And count its occurrences.

EDIT_2: So, for example, I need to match only the lines which contain:

01:05:04:15 --> 01:05:07:09,

And before each such line I need the number of the occurrence of the aforementioned example to be inserted into the file.

I've come up with this short Shell script which utilizes 'sed' command but it takes ages to process a bit bigger file (more than 60 lines, for example).

# Define the number of the special chars - so you can calculate the number of the subtitle lines
special_chars_no="$(grep -o ' --> ' Output_File | wc -l)"

# Add numbering before every subtitle line
for ((i=1;i<=${special_chars_no};i++)) ;
do
sed -i '/\([0-9][0-9]\):\([0-9][0-9]\):\([0-9][0-9]\):\([0-9][0-9]\) -->/{:1 ; /\(.*\([0-9][0-9]\):\([0-9][0-9]\):\([0-9][0-9]\):\([0-9][0-9]\) -->\)\{'"${i}"'\}/!{N;b1} ; s/\([0-9][0-9]\):\([0-9][0-9]\):\([0-9][0-9]\):\([0-9][0-9]\) -->/'"${i}"'\n\1:\2:\3:\4 -->/'"${i}"' ; :2 ; n ; $!b2}' Output_File  
done

Can we make it usable (much faster)?

Apologies, I needed to modify my input strings - there's one more new line after every line. So when I use your code - the output is: `1 01:05:01:11 --> 01:05:04:07,so you may continue to support us,|bring us health, 2 3 01:05:04:15 --> 01:05:07:09,well-being, 4 5 01:05:07:21 --> 01:05:13:01,and help us to be one big family|and continue working as a team. 6 ` It puts one more number - which is not the desired output. — Xtigyro, Oct 29 '17 at 10:10
Please edit your post and add sample inputs and outputs in code tags. — RavinderSingh13, Oct 29 '17 at 10:25
Yes, I did so. I'm a bit new to StackOverflow - it's now clear I think. Please let me know if it's not. Thank you for the help, Ravinder! — Xtigyro, Oct 29 '17 at 10:28
`1 01:05:01:11,01:05:04:07,so you may continue to support us,|bring us health, 3 01:05:04:15,01:05:07:09,well-being, 5 01:05:07:21,01:05:13:01,and help us to be one big family|and continue working as a team.` The numbers should be '1', '2' and '3' - and not the ones that were generated, Ravinder. — Xtigyro, Oct 29 '17 at 10:49
So you need numbers before lines not like 1st number then line? please confirm once. — RavinderSingh13, Oct 29 '17 at 10:51
You should have noticed by now that your example in your question is not suitable to describe the problem. — Cyrus, Oct 29 '17 at 10:52
May I know why? I want to count the match occurrences - and not every new line. — Xtigyro, Oct 29 '17 at 10:56
@Xtigyro it would be good to add some lines that don't match your search criteria so that sample input/output itself gives good idea of what you need... also, no need for such big lines in sample, you can truncate it.. your attempted code tries to match number format and all, but could your search criteria be simplified to look for just `-->` ? — Sundeep, Oct 29 '17 at 11:21
@Sundeep: If we do this - how will we put the numbering before the pattern? I think it will be replaced by the numbering and a ' --> ' only. — Xtigyro, Oct 29 '17 at 11:37
that is easy to do, all we ask is simple sample that has both lines to be numbered and lines to be left alone... — Sundeep, Oct 29 '17 at 11:39

score 6 · Answer 1 · answered Oct 29 '17 at 12:11

6

$ awk '/-->/{print ++cnt} 1' file
1
01:05:01:11 --> 01:05:04:07,so you may continue to support us,|bring us health,
$Italic = True
2
01:05:04:15 --> 01:05:07:09,well-being,
$Italic = False
3
01:05:07:21 --> 01:05:13:01,and help us to be one big family|and continue working as a team.

answered Oct 29 '17 at 12:11

Ed Morton

188,023
17
78
185

2

@Sundeep You're spending too much time with perl :-). – Ed Morton Oct 29 '17 at 12:19
1

@EdMorton, I think I missed it, was too easy, couldn't understand the question itself :( – RavinderSingh13 Oct 29 '17 at 12:27

Sundeep · Accepted Answer · 2017-10-29T12:19:52.673

3

sed is not suitable when it comes to using arithmetic and using a shell loop to process text is not advisable

$ cat ip.txt 
01:05:01:11 --> 01:05:04:07,so you may continue
$Italic = True
01:05:04:15 --> 01:05:07:09,well-being,
$Italic = False
01:05:07:21 --> 01:05:13:01,and help us to be

$ awk '/-->/{$0 = ++i RS $0} 1' ip.txt
1
01:05:01:11 --> 01:05:04:07,so you may continue
$Italic = True
2
01:05:04:15 --> 01:05:07:09,well-being,
$Italic = False
3
01:05:07:21 --> 01:05:13:01,and help us to be

/-->/ if line matches this REGEXP
- $0 = ++i RS $0 prefix input record with line number and separate them with value of RS which is newline by default
- i variable will get 0 as default value in numeric context, ++i will give incremented value every time line matches the given REGEXP
1 idiomatic way to print contents of input record $0
See also awk save modifications in place

You can also use perl

# use perl -i -pe for inplace editing
perl -pe 's/^/++$i . "\n"/e if /-->/' ip.txt
# or, borrowing Ed Morton's simplicity
perl -lpe 'print ++$i if /-->/' ip.txt

edited Oct 29 '17 at 12:19

answered Oct 29 '17 at 11:56

Sundeep

23,246
2
28
103

Thank you very much, Sundeep! I do need to learn more 'awk' - it appears that it is much more powerful than I thought. – Xtigyro Oct 29 '17 at 12:14
Serious question as I see this all the time in perl scripts: wrt `'print ++$i if /-->/'` - can't you write normal condition-action syntax something like `if /-->/ { print ++$i }'` in perl? If so, why on earth would anyone write yoda-like expressions with what you're going to do first and then the condition under which you'll do it afterwards??? "Drive to work if it's a weekday I will". – Ed Morton Oct 29 '17 at 14:12
that is just my personal preference... syntax with your preference would be `perl -lpe 'if(/-->/){print ++$i}'` ... common to use single line `$a = 5 if $b > 2` instead of `if($b > 2){ $a = 5 }` that would need 2/3 lines in program file (assuming someone follows one statement per line) – Sundeep Oct 29 '17 at 14:22
Any reason to prefer `{$0 = ++i RS $0} 1` over `{print ++i RS $0}`? – randomir Oct 29 '17 at 16:58
@EdMorton, speaking in general (not specific to perl), `if a then b else c` is usually a *statement*, and `b if a else c` is an *expression*. The former (usually) has no (return) value, but can have side-effects, while the latter always evaluates to some value (and usually has no side-effects in a functional-programming sense). In C, ternary operator `a ? b : c` had to be invented to enable such conditional expressions. So, one would use conditional expressions when it needs conditional value, not action. Continuing your example: `state = 'driving' if weekend else 'resting'`. – randomir Oct 29 '17 at 17:11
@randomir & Sundeep Grateful for your responses I am. Consider them I will. – Ed Morton Oct 29 '17 at 20:26
Consider it I have and writing the actions you perform and THEN the conditions under which you perform them is just completely unnecessary obfuscation. It makes the code a bit harder to read (and easier to misunderstand) than simply specifying the condition then the action and any reduction in code size is negligible so there's no reason to do it. Just my opinion of course. – Ed Morton Oct 29 '17 at 23:38
I don't use Perl anymore other than one-liners, but when I did write large programs, this came up often for me and one-line instead of 4 (I like `{}` on separate lines) was preferable.. and it is idiomatic once you get used to it.. just like how significant white-space/comprehensions/lambda/etc took a while to get used to in Python... – Sundeep Oct 30 '17 at 05:54
@randomir using `{print ++i RS $0}` would require another condition to print the other lines where number shouldn't be added... using `/-->/{$0 = ++i RS $0} 1` changes `$0` based on a condition and `1` will always print `$0` whether or not `$0` was changed – Sundeep Oct 30 '17 at 05:55
1

@Sundeep Yeah, clearly it's idiomatic as I see it all over the place but IMHO it's just one more contributor to http://www.zoitz.com/archives/13. I find I have to write my Python with `# {` at the start and `# }` at the end of the indented blocks so I can visually make sure the indented code is actually what I expected it to be and also so that `vi` can match the start/end of blocks when I want to jump between them so IMHO that was a swing and a miss from the Python guys too. – Ed Morton Oct 30 '17 at 14:52

potong · Answer 3 · 2017-10-29T16:52:09.157

This might work for you (GNU sed):

sed -r '/-->/{x;:a;s/9(_*)$/_\1/;ta;s/^_*$/0&/;s/$/\n0123456789/;s/([^_])(_*)\n.*\1(.).*/\3\2/;y/_/0/;G;p;s/\n.*//;x;d}' file

On encountering the string -->, swap to the hold space (HS) and replace any trailing 9's with _'s. Add a 0 if this is the first time or all the characters are all _'s. Increment the last number and then replace all _'s by 0's. Append the pattern space (PS) and print the counter and the current line. Remove the current line, leaving the counter primed for the next match and return to the PS. Finally delete the PS. For lines that do not match, print as normal.

Thank you very much, potong! This one is a killer re complexity of solution! Nice one! — Xtigyro, Oct 30 '17 at 21:06

RavinderSingh13 · Answer 4 · 2017-10-29T10:22:12.590

0

Your question is not that clear, seeing your ecpected output, following awk may help you in same.(I have old awk so adding re-interval in recent awk it could be removed.) I am assuming you want to look a specific string on a line and print the line number of it.

awk --re-interval '/[0-9]{2}:[0-9]{2}:[0-9]{2}:[0-9]{2}/{print FNR ORS $0}'  Input_file

In case you want to add count before a line then change ORS to OFS in above code.

In case you need to save output in your Input_file itself then following may help you in same too.

awk --re-interval '/[0-9]{2}:[0-9]{2}:[0-9]{2}:[0-9]{2}/{print FNR ORS $0}'  Input_file > temp_file && mv temp_file  Input_file

EDIT: If you want to only print line number before each line then following may help you.

awk '{print FNR ORS $0 ORS}'  Input_file

edited Oct 29 '17 at 10:22

answered Oct 29 '17 at 09:51

RavinderSingh13

130,504
14
57
93

`1 01:05:01:11 --> 01:05:04:07,so you may continue to support us,|bring us health, 3 01:05:04:15 --> 01:05:07:09,well-being, 5 01:05:07:21 --> 01:05:13:01,and help us to be one big family|and continue working as a team.` It does not count correctly the lines - I need to count only the lines with ' --> '. – Xtigyro Oct 29 '17 at 10:16
@Xtigyro, please check my EDIT now, not sure though because you have to edit your post for any clarifications not in comments please, check and let me know then. – RavinderSingh13 Oct 29 '17 at 10:22
2

`--re-interval` is gawk-specific and became the default for gawk about 5 years ago. If you need that then get a newer gawk as you're missing a ton of useful functionality. – Ed Morton Oct 29 '17 at 12:08
1

@EdMorton, thank you Ed sir, but I can't install anything so as of now I have to live with it, could try in own lab though. – RavinderSingh13 Oct 29 '17 at 12:28
1

@RavinderSingh13: Thank you for the help anyway - I appreciate the time that you spent for me! – Xtigyro Oct 29 '17 at 12:47

Put Numbering before Every Matched String Pattern

4 Answers4