How to Insert lines at specific location in file using perl script

Question

this is my problem I'm trying to read an HTML file(index.html) then search all links an put it on a second file named salida.html, I read this answer, I read this answer and I tried to do it, but it didn't work for me. This is my perl code:

use strict;
use warnings;
use 5.010;
use Tie::File;

my $entrada='index.html';
my $salida='salida.html';
open(A,"<$entrada");
my @links;  
foreach my $linea (<A>){
    print "Renglon => $linea\n" if $linea =~ m/a href/;
    #print $B $linea if $linea =~ m/a href/;
    push @links, $linea if $linea =~ m/a href/;
}

tie my @resultado, 'Tie::File', 'salida.html' or die "Nelson";
for (@resultado) {
    if ($_ =~ m/<main class="contenido">/){
        foreach my $found (@links){
            $_ .= '<br/>'.$found;
        }
        last;
    }
}
close(A);

My Perl code runs without problems but in the for of my code I'm trying to write the links that I have in my variable $links in a specific part of my salida.html file:

<!DOCTYPE html>
<html lang="es-mx">

<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <meta http-equiv="X-UA-Compatible" content="ie=edge">
    <title>Resultados de la busqueda</title>
    <link rel="stylesheet" href="style-salida.css">
</head>

<body>
    <div class="contenedor">
        <header class="header">
            <h2>Resultados de la busqueda</h2>
        </header>
        *<main class="contenido">

        </main>*
        <footer class="footer">
            <h4>
                Gerardo Saucedo Arevalo - 15092087 - Topicos selectos de tecnologias web - Búsqueda de enlaces dentro de
                una página web
            </h4>
        </footer>
    </div>
</body>

</html>

But my code always add the lines at the end of the file, I ran this code once and it worked perfectly, but then I add some lines and when I tried to run one more time didn't work. I restored my file at the moment when it worked but it does not work anymore. What I'm doing wrong?

Your example seems incomplete: `index.html` is missing (I guess the included HTML is `salida.html`. Furthermore: always use a HTML parser, e.g. [HTML::TreeBuilder](https://metacpan.org/pod/HTML::TreeBuilder) to parse the HTML and then operate on the DOM instead. — Stefan Becker, Mar 18 '19 at 07:07
FYI: [never use a regex to parse HTML/XML/...](https://stackoverflow.com/questions/1732348#1732454) — Stefan Becker, Mar 18 '19 at 07:50

score 0 · Answer 1 · answered Mar 18 '19 at 07:36

Always process HTML or XML with an appropriate parser and then implement your processing on the DOM. My solution uses HTML::TreeBuilder. As your question doesn't include the contents of index.html I have appended my own to the solution:

#!/usr/bin/perl
use warnings;
use strict;

use HTML::TreeBuilder;

# Extract links from <DATA>
my $root1 = HTML::TreeBuilder->new->parse_file(\*DATA)
    or die "HTML: $!\n";

my @links = $root1->look_down(_tag => 'a');

# Process salida.html from STDIN
my $root2 = HTML::TreeBuilder->new;
$root2->ignore_unknown(0);
$root2->parse_file(\*STDIN)
    or die "HTML: $!\n";

# insert links in correct section
if (my @nodes = $root2->look_down(class => 'contenido')) {
    $nodes[0]->push_content(@links);
}

print $root2->as_HTML(undef, '  '), "\n";

# IMPORTANT: must delete manually
$root2->delete;
$root1->delete;

exit 0;

__DATA__
<!DOCTYPE html>
<html>
  <head>
    <title>test</title>
  </head>
  <body>
    <div>
      <a href="link1.html">Link 1</a>
      <a href="link2.html">Link 2</a>
    </div>
  </body>
</html>

Test run:

$ perl dummy.pl <dummy.html
<!DOCTYPE html>
<html lang="es-mx">
...
 <main class="contenido"> <a href="link1.html">Link 1</a><a href="link2.html">Link 2</a></main> 
...
</html>

that works fine for me, now I'm trying to save the result in the salida.html file. I did this to open the file: open (RES,">$salida") || die ($!); and I overwrite it with: print RES $res; Where $res= $root2->as_HTML(undef, ' '), "\n";. I'm doing it fine? — KillerBee GSA, Mar 18 '19 at 18:51
Then you can't read the file from STDIN, instead you should read from the file directly with `->parse_file("salida.html")` (the same method also accepts file names), Please do not use the insecure 2-parameter open and old-style filehandles, use `open(my $ofh, '>', $salida); print $ofh $root2->as_HTML(); close($ofh);` — Stefan Becker, Mar 18 '19 at 20:54

How to Insert lines at specific location in file using perl script

1 Answers1