Perl script to search and replace multiple lines in multiple html files

Question

I have many html files in a folder. I need to somehow remove a <div id="user-info" ...>...</div> from all of them. As far as I know I need to use a Perl script for that, but I don't know Perl to do that. Could someone get it for me?

Here is how the "bad" code looks like:

<div id="user-info" class="logged-in">
    <a class="icon icon-key-delete" href="https://test.dev/login.php?0,logout=1">Log Out</a>
    <a class="icon icon-user-edit" href="https://test.dev/control.php">Control Center</a>


</div> <!-- end of div id=user-info -->

Thank you in advance!

http://stackoverflow.com/questions/1030787/multiline-search-replace-with-perl — alestanis, Oct 26 '12 at 21:15
Sorry, I wasn't able to figure that out. I did not have any PERL experience before. — user1751343, Oct 26 '12 at 21:57
Just a word of advice, we much prefer if you have attempted to solve this problem by yourself rather than asking the community to arrive at a complete solution for you, even if your attempt is completely broken. Thanks. — Kev, Oct 29 '12 at 00:04

score 3 · Answer 1 · answered Oct 26 '12 at 21:27

3

Using XML::XSH2:

for { glob '*.html' } {
    open :F html (.) ;
    delete //div[@id="user-info" and @class="logged-in"] ;
    save :b ;
}

answered Oct 26 '12 at 21:27

choroba

231,213
25
204
289

kbenson · Accepted Answer · 2012-10-27T00:05:23.817

perl -0777 -i.withdiv -pe 's{<div[^>]+?id="user-info"[^>]*>.*?</div>}{}gsmi;' test.html

-0777 means split on nothing, so slurp in whole file (instead of line by line, the default for -p

-i.withdiv means alter files in place, leaving original with extension .withdiv (default for -p is to just print).

-p means pass line by line (except we are slurping) to passed code (see -e)

-e expects code to run.

man perlrun or perldoc perlrun for more info.

Here's another solution, which will be slightly more familiar to people that know jquery, as the syntax is similar. This uses Mojolicious' ojo module to load up the html content into a Mojo::DOM object, transform it, and then print that transformed version:

perl -Mojo -MFile::Slurp -E 'for (@ARGV) { say x(scalar(read_file $_))->at("#user-info")->replace("")->root; }' test.html test2.html test*.html

To replace content directly:

perl -Mojo -MFile::Slurp -E 'for (@ARGV) { write_file( $_, x(scalar(read_file $_))->at("#user-info")->replace("")->root ); }' test.html

Note, this won't JUST remove the div, it will also re-write the content based on Mojo's Mojo::DOM module, so tag attributes may not be in the same order. Specifically, I saw <div id="user-info2" class="logged-in"> rewritten as <div class="logged-in" id="user-info2">.

Mojolicious requires at least perl 5.10, but after that there's no non-core requirements.

Thank you very much! Could you please tell me how can I make a .pl file to do this? — user1751343, Oct 26 '12 at 21:32
@user1751343 Beware! If you have a `div` inside your `div id="user-info"` this will do funky things on your html. — alestanis, Oct 26 '12 at 22:04
That's very true. Use at your own risk. The other solution, using XML::XSH2 will give you more correct results in this case. — kbenson, Oct 26 '12 at 23:08

Perl script to search and replace multiple lines in multiple html files

2 Answers2