-4

Following this post, I want to replace all the HTML structures of:

+++ <details><summary> +++
some description
+++ </summary><div> +++
this
is
going
to be
folded
+++ </div></details> +++

with native AsciiDoc

.some description
[%collapsible]
====
this
is
going
to be
folded
====

in all files of a folder and all its subfolders. If I wanted to replace a single string then I could use any of the methods in this page, but here I have a structure with other stuff inside. I would appreciate it if you could help me know what is the most canonical/efficient way to do this.

P.S. I presumed my question is clear, but just for clarification, I don't want to replace two above strings, but structures. In other words:

  1. +++ <details><summary> +++\n --> .
  2. +++ </summary><div> +++ --> [%collapsible]\n====
  3. +++ </div></details> +++ --> ====

I could replace these in three rounds, but I want to learn how I can do it once.

P.S.2. My question is very similar to this one.

P.S.3. The regex patter should be something like

(\+{3}\s*<details>[\S\s]*?<summary>\s*\+{3})[\S\s]*?(\+{3}\s*<\/summary>[\S\s]*?<div>\s*\+{3})[\S\s]*?(\+{3}\s*<\/div>[\S\s]*?<\/details>\s*?\+{3})

However I am not able to get sed working. This is as far as I can go:

find . -type f -name "*.adoc" -o -name "*.sci" | xargs sed -n -E '/(\+{3} <details><summary> \+{3})/p'
Foad S. Farimani
  • 12,396
  • 15
  • 78
  • 193
  • 2
    Welcome to Stack Overflow. SO is a question and answer page for professional and enthusiastic programmers. Add your own code to your question. You are expected to show at least the amount of research you have put into solving this question yourself. – Cyrus Oct 27 '19 at 11:00
  • Dear @Cyrus I'm not really new to the SO. I tried not to clutter the post because based on my experience it confuses others. but in the other post, I have linked I have mentioned the regex pattern I have tried. Besides that I don't have other results because I don't know the right keywords to search. – Foad S. Farimani Oct 27 '19 at 11:03

1 Answers1

2

EDIT: Since OP clarified question more so adding code as per that now.

Assuming following is the Input_file.

cat Input_file
aaaaaa
bbbbbib
<details>
<summary>
singh1
singh2
test1 ba bla bla
</summary>
<div>
whwiuwviweivbw
wivuibwuivweiweg

wkvbwjvbwjbvwbviwrbhb

wvhwrivbwvbwrvbw
</div>
</details>
bfifiefe
fjbfiuebfiewfhbew

jwnjwnjwevbw

Now run following code.

awk -v RS="^$" '
{
  gsub(/<details>\n<summary>.*<\/summary>/,".\n</summary>")
  gsub(/<\/summary>\n<div>.*<\/div>/,"[%collapsible]" ORS "====" ORS "</div>")
  gsub(/<\/div>\n<\/details>/,"====")
}
1
' Input_file

Output will be as follows.

aaaaaa
bbbbbib
.
[%collapsible]
====
</div>
whwiuwviweivbw
wivuibwuivweiweg

wkvbwjvbwjbvwbviwrbhb

wvhwrivbwvbwrvbw
====
bfifiefe
fjbfiuebfiewfhbew

jwnjwnjwevbw


Could you please try following, I have tested this with gawk and with one test Input_file and it worked successfully, would request you to check it with 1 Input_file once and once Happy with results try it out on *.html files then.

First set current value variable as old_text shell variable:

old_text="+++ <details><summary> +++
some description
+++ </summary><div> +++
this
is
going
to be
folded
+++ </div></details> +++"

Now set shell variable named new_text with new text value which you want newly in Input_file(s).

new_text=".some description
[%collapsible]
====
this
is
going
to be
folded
===="

Now run following code on Input_file.

gawk -v old="$old_text" -v new="$new_text" -v RS="^$" -i inplace '
{
  found=index($0,old)
}
found{
  print substr($0,1,found) new substr($0,found+length(old)+1)
  found=""
  next
}
'  Input_file


Explanation: Adding detailed explanation for code.

gawk -v old="$old_text" -v new="$new_text" -v RS="^$" -i inplace '   ##Starting gawk program here mentioning variable named old whose value is of value of shell variable named old_text.
                                                                     ##New variable has new_text shell variable value in it. Now Setting RS(record separator as ^$) to make all lines to be treated as a single one.
{                                                                    ##Starting main BLOCK  here.
  found=index($0,old)                                                ##using index function of awk which will provide index number of ay provided variable, here we want to know index(starting point) of variale old and saving it into found awk variable.
}
found{                                                               ##Checking condition if vriable found is NOT NULL then do following.
  print substr($0,1,found) new substr($0,found+length(old)+1)        ##Printing substring from line 1st character to till index of variable old then printing new variable and again printing sub-string which will basically print everything after old variable, nothing should be removed unnecessarily.
  found=""                                                           ##Nullifying found variable here.
  next                                                               ##next will skip all further statements from here.
}                                                                    ##Closing main BLOCK here.
'  Input_file                                                        ##Mentioning Input_file name here.
RavinderSingh13
  • 130,504
  • 14
  • 57
  • 93
  • I'm trying to understand your answer. I know a bit of regex, but have never used gawk tbh. How did you recognize the HTML code structure I mentioned in the post? – Foad S. Farimani Oct 27 '19 at 16:04
  • @Foad,I have added detailed explanation now, kindly do let me know in case of any queries please. – RavinderSingh13 Oct 27 '19 at 16:23
  • Sorry if my question would seem rud, but did you assume that I want to replace a string with another? because the internals of the structure I mentioned are just arbitrary text. For example, never in the files, there is a string of "some description". I want to replace the structure, not the strings. That's why I tagged regex. – Foad S. Farimani Oct 27 '19 at 16:31
  • @Foad, Sorry if I didn't get your question. It is not a single string, it is number of lines which you can mention in strings(1 which needs to be replaced and 1 which needs to be put there newly). If this is not the case then let us discuss more clearly on this one with more examples please. – RavinderSingh13 Oct 27 '19 at 16:52
  • I edited the post for clarification. basically I don't want to replace a multiline string with another, but the HTML structure around those arbitrary content with AsciiDoc ones. – Foad S. Farimani Oct 27 '19 at 16:58
  • @Foad, ok got it, it means `+++
    ` -----> `+++
    ` is a TAG right? Or it is a tag from `+++
    ` ---->`` is a TAG? Could you please confirm once on same?
    – RavinderSingh13 Oct 27 '19 at 17:04
  • The HTML tags are `
    ...
    `, `...`, and `
    `. The `+++ ... +++` is the AsciiDoc tag for embeding HTML afik.
    – Foad S. Farimani Oct 27 '19 at 17:16
  • @Foad, Could you please check my EDIT solution now and let me know? I have tested for single tag occurrence only as of now, lemme know how it goes then? – RavinderSingh13 Oct 27 '19 at 17:30
  • I will look into the `awk`'s `gsub` command and will come back here. – Foad S. Farimani Oct 27 '19 at 22:14