I want to write a bash script that can identify a tag within an text that matches a multi-line pattern so that I can use the identifying tag to further process the nested tags for later processing. I've searched through multiple questions but they all seem to be falling short in some way or another, making it difficult to progress. What I have been successful at is being able to match the patterns and get the matched lines but however it comes out as a single output (I believe). First here is the sample text file I am testing with.
random words to put here: dresser car street space
*
********************************************************************************
********************************************************************************
-->
interested data: name="someFile_1.txt"random data
endMultilinePattern
<!--****************Random comment***************-->
startMultilinePattern id="someFileTag_2"
interested data: name="someFile_2.txt"random data
endMultilinePattern
<!--****************Random comment***************-->
startMultilinePattern id="someFileTag_3"
interested data: name="someFile_3.txt"random data
endMultilinePattern
some random data body
some random nested data filepath="/" uuid="randomcharacters"random data
some random data body
more random data
endMultilinePattern
startMultilinePattern id="someFileTag_2"
interested data: name="error_someFileTag_2.txt"random data
endMultilinePattern
<!--****************Random comment***************-->
Here are some outputs I've gotten and the answers that led to them. Perhaps through poor understanding of my own, I may not know how to use the commands properly. First of all, the id I am interested in is in startMultilinePattern id="someFileTag_2">
, I will use id
later down in the file to match other tags that use that id
. Secondly, I will want to grab the attribute name
in interested data: name="..."random data
tag in order to search that file in the filesystem for further processing. In this question, all I want to do right now is get startMultilinePattern> ... multi-line match ... endMultilinePattern
and then grab the file name within the interested data: name="..."random data
tag. Here we go:
The following makes use of the -P
option in grep for perl, although it gets the proper output, I can't seem to read into an array and output each mult-line match.
Src: grep (bash) multi-line pattern
$ $ grep -Pzon "((startMultilinePattern )(.|\n)*?(endMultilinePattern))" test.txt | while read -a grepOut; do POS=$((POS+1)) && echo "0=${grepOut[0]}, 1=${grepOut[1]}, 2=${grepOut[2]}, 3=${grepOut[3]}}";done 0=1:startMultilinePattern, 1=id="someFileTag_2", 2=, 3=}
0=interested, 1=data:, 2=name="someFile_2.txt"random, 3=data}
0=endMultilinePattern1:startMultilinePattern, 1=id="someFileTag_3", 2=, 3=}
0=interested, 1=data:, 2=name="someFile_3.txt"random, 3=data}
0=endMultilinePattern1:startMultilinePattern, 1=id="someFileTag_2", 2=, 3=}
0=interested, 1=data:, 2=name="error_someFileTag_2.txt"random, 3=data}
# grep command by itself provides the following output:
1:startMultilinePattern id="someFileTag_2"
interested data: name="someFile_2.txt"random data
endMultilinePattern1:startMultilinePattern id="someFileTag_3"
interested data: name="someFile_3.txt"random data
endMultilinePattern1:startMultilinePattern id="someFileTag_2"
interested data: name="error_someFileTag_2.txt"random data
endMultilinePattern
Using sed which should be more suitable presumably, I found this interesting answer but I have not been able to make it work. It uses some funky start keywords I don't understand. Src: https://unix.stackexchange.com/questions/112132/how-can-i-grep-patterns-across-multiple-lines
sed -n '/\startMultilinePattern /{:start /endMultilinePattern/!{N;b start};/startMultilinePattern .*\n.*\n.*endMultilinePattern/p}' test.txt
Additionally, the following sed command supposedly works as its on numerous answers but perhaps its old functionality. I can't get it to work as the output doesn't seem as intended. It includes part of the text I DON'T WANT i.e., <some random data body ....
.
Src: https://unix.stackexchange.com/a/112134/388443
$ sed -e '/startMultilinePattern /,/endMultilinePattern/!d' test.txt
startMultilinePattern id="someFileTag_2"
interested data: name="someFile_2.txt"random data
endMultilinePattern
startMultilinePattern id="someFileTag_3"
interested data: name="someFile_3.txt"random data
endMultilinePattern
startMultilinePattern id="someFileTag_2"
interested data: name="error_someFileTag_2.txt"random data
endMultilinePattern
There are other answers with their own way of doing. Some use awk, I don't know awk so didn't try and also I cannot use pcregrep because I don't have root permissions to install it. From what I understand, grep -P is pcregrep equivalent more or less. Ideas?