-1

I'm looking for a way to determine the code to text ratio of a web page in Perl. Not looking for anything complex just a simple print out like HTML Code:75% Text:25% just for SEO Reasons.

Blnukem
  • 163
  • 1
  • 4
  • 12
  • I would like to take a webpage as a file, place into a variable and the determine the percentage that is HTML code and the percentage that is visible text. – Blnukem Mar 07 '12 at 14:41
  • 1
    HTML tags are rarely "code". How about using the term "markup"? – mob Mar 07 '12 at 18:13
  • HTML tags are always "code". They aren't "program code" but they are "code". (As is ROT13) – Quentin Mar 07 '12 at 22:36

2 Answers2

4

Use HTML::TreeBuilder to strip out the text.

#!/usr/bin/perl

use strict;
use warnings;
use v5.10;

use LWP::Simple;
use HTML::TreeBuilder;

my $content = get(shift @ARGV);
die "Couldn't get it!" unless defined $content;

my $text = HTML::TreeBuilder->new_from_content($content)->as_text;

my $html_size = length $content;
my $text_size = length $text;
my $percentage = 100 * ( $text_size / $html_size );

say qq[$percentage%];
Quentin
  • 914,110
  • 126
  • 1,211
  • 1,335
-2

Hmm... thinking quickly... How about:

my $htmllength = 0;
my $textlength = 0;
while(<>) {
    s/(<[^>]*>)/$htmllength += length($1); "";/eg;
    $textlength += length($_);
}

print "HTML Code: " . (100 * $htmllength / ($htmllength + $textlength)) . "\n";
print "Text     : " . (100 * $textlength / ($htmllength + $textlength)) . "\n";

You can then simply run the script on the file(s) in question:

perl SCRIPT file1.html file2.html

NOTE: this will not work if your data contains any CDATA fields

Wes Hardaker
  • 21,735
  • 2
  • 38
  • 69