0

i try to parse a html page which a have loaded with perl. i need to get the src="asd/jkl/xyz.css" for example out of the html-repsone to manipulate the path to an absolute.

the reason why i want to do this is, that is need the css inline in a E-Mail head ...

so my try to realize this is:

  1. load the page via perl
  2. get the src of the linked css
  3. load the css files via perl
  4. parse the css und put the contents of the css files in the head-tag of my generated email.

has anyone a better idea or a working regex?

AstroCB
  • 12,337
  • 20
  • 57
  • 73
SirApfel
  • 11
  • 3
  • 8
    use... parser... not... regex... – tenub Jan 15 '14 at 17:13
  • possible duplicate of [RegEx match open tags except XHTML self-contained tags](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags) – Aaron Miller Jan 15 '14 at 17:42
  • If you want to use a regex, you have to show the exact text you will be parsing: there's a difference between `` and ``, for example. Having said that, it is rarely a good idea to parse HTML with regex. Use a [real](http://search.cpan.org/dist/HTML-Parser/Parser.pm) [HTML parser](http://search.cpan.org/~gaas/HTML-Parser-3.71/lib/HTML/TokeParser.pm) as tenub suggested. – ThisSuitIsBlackNot Jan 15 '14 at 17:43

1 Answers1

1

Try something like this:

#!/usr/bin/env perl

use XML::LibXML;

my $parser = XML::LibXML->new();
my $doc = $parser->load_html(location => "http://mywebsite.com", recover => 2);

print $doc->findnodes('//link[@rel="stylesheet"]/@src');

Reference: http://metacpan.org/pod/XML::LibXML

szabgab
  • 6,202
  • 11
  • 50
  • 64
Stephan
  • 41,764
  • 65
  • 238
  • 329