I want to replace part of a file that matches a regexp. The point is, that it has to work over whole file as a single string like grep -Pzo
, but, as far as I know, sed
is line-based.
I have tried to force sed
to do this by manipulating IFS
, but I am still inexperienced in bash
and I am not really sure about what I'm doing.
I hope you will help me clarify some things that I don't understand.
So I made something like this:
OIFS=$IFS
IFS=""
content=$(cat -v file | sed 's/(?<=<\/div>(?!.*\/div>)).*//')
#Remove everything begining from last </div> to the end of file.
IFS=$OIFS
But I doesn't work as I intended. I was also experimenting with perl
to make this substitution, but the problem seems to be the same.
I will appreciate any tips.
EDIT:
According to comments below I am pasting some example data:
Source:
<html>
<body>
<div>
some site with many <div> divs </div>
<div> and more <div> even more </div> </div>
</div> <!-- last div closing -->
This is all to be deleted
</body>
</html>
Then after: s/</div>(?<=<\/div>(?!.*\/div>)).*//s
<html>
<body>
<div>
some site with many <div> divs </div>
<div> and more <div> even more </div> </div>
EDIT 2:
I found yet simpler way than suggested below:
cat file | perl -0pe 's/(?<=<\/div>(?!.*\/div>)).*//'
-0 causes record separator to be null, which makes perl to process whole string in one run instead of looping through lines.