0

I have a function in my script that's supposed to take in a string of HTML and return the same string with the exception that all the elements shall have been changed to one that is 2 levels higher (ie. h1->h3, h2->h4 etc.). This of cause needs to work independed of casing, and it must not remove attributes, however, I'm not about to use a full html-parser for it either, as it's a fairly simple task, so I figured I'd go about this with regexes. The problem is (me beeing fairly new to vbscript and all) that I don't know how to achieve the desired effect.

What I have currently is this:

Function fiksoverskrifter(html)
   Dim regex, matches, match
   Set regex = New RegExp
   regex.Pattern = "<(/?)h([0-9])(.*?)>"
   regex.IgnoreCase = True
   regex.Multiline = False

   fiksoverskrifter = html

   Set matches = regex.Execute(html)
   For Each match in matches

   Next

   Set regex = Nothing
   Set matches = Nothing
   Set match = Nothing
End Function

What I want inside the For Each-loop is simply to swap the numbers, however, I'm not sure how to do that (I'm not even sure what properties the match-object exposes, and I've been unable to find it online).

How should I complete this function?

Alxandr
  • 12,345
  • 10
  • 59
  • 95

2 Answers2

2

You’re asking for pain trying to do this with regex (not so much the replace but the fact that it's an increment with a single regex pattern), if it's only a case of replacing the Headers, i'd use replace():

For i = 4 To 1 Step -1
    strHtml = replace(strHtml, "<h" & cstr(i), "<h" & cstr(i + 2), 1, -1, vbTextCompare)
    strHtml = replace(strHtml , "</h" & cstr(i), "</h" & cstr(i + 2), 1, -1, vbTextCompare)
Next

(HTML Spec is only valid for H1-H6 - not sure if you want to ignore H5 & H6)

If you want to stick with the regex option, i'd suggest the use of regex.replace()

I know in JavaScript you can pass the matched pattern into a function and use that function as the replacement, exactly what you would need here - but i've never seen this done in VBSCRIPT, example: Use RegExp to match a parenthetical number then increment it

Edit 1:

Found the reference to the matches collection & match object:

http://msdn.microsoft.com/en-us/library/ms974570.aspx#scripting05_topic3

So, you could read the match from the match.value property, but you'd still need to resort to a 2nd replace i think

Community
  • 1
  • 1
HeavenCore
  • 7,533
  • 6
  • 47
  • 62
  • I must say though, that you can't do it like this. Cause, then you'd replace h1 with h3, and then h3 with h5 later in the loop xD – Alxandr Jun 12 '12 at 10:27
  • @Alxandr: Easily fixed use `For 4 To 1 Step -1` – AnthonyWJones Jun 12 '12 at 12:18
  • Also, you lacked `)` at the end of the lines, and you did comparison without ignoring the case, and you also never assigned the result to anything xD. I updated your code to something that should work in case anybody else needs this. – Alxandr Jun 13 '12 at 09:31
0

Here's a more general solution, probably off-topic, though, as it is in Perl, not in VBScript. Note I documented it to counter the write-only effect that regular expressions tend to have.

C:\TEMP :: more /t4 hdradj.pl
use strict;
use warnings;

# Make a subroutine that will adjust HTML headers
# (like h1, H2) by doing string replacement.
# Will only work properly for the range from 1 to 9.
sub mk_header_adjuster {
    my( $adjustment, $lower, $upper ) = @_;
# Compile substitution expression to adjust headers like h1, H2.
# Left-hand side matches headers from the range specified.
# Uses word boundaries (\b) and a character range (square brackets).
# Captures matches for "h|H" in $1 and for the level in $2.
# Right-hand side uses an eval (e-switch) to compute substitution.
# Case is ignored (i-switch), replacement is global (g-switch).
# Wraps expression in subroutine to modify first argument.
    my $str = <<"EOSUB";
sub {
    \$_[0] =~ s/\\b(h)([$lower-$upper])\\b/\$1 . (\$2 + $adjustment)/ige
}
EOSUB
#   print $str, "\n"; # debug output
    my $sub = eval $str; # compile expression
    die $@ if $@; # abort in case of errors in eval compilation
    return $sub;
}

# ==== main ====
# Test the above subroutine is working properly.
my $test_input = <<'EOS';
<h1>eins</h1>
<p>...
<h2 bla="blub">eins blub</h2>
< H2 >zwei </ H2>
<h3 >drei </h3>
<h4>vier </h4>
<h5>fünf </h5>
EOS

# Compile a header adjuster moving headers 2 to 4 by 2 levels down.
my $adjuster = mk_header_adjuster 2, 2, 4;
my $number_of_replacements = $adjuster->( $test_input );
printf STDERR
"Replaced %u header tags in supplied input.\n", $number_of_replacements;
print $test_input;
Lumi
  • 14,775
  • 8
  • 59
  • 92
  • How does that help when my main problem was with how regexes works in vbs? o.O – Alxandr Jun 12 '12 at 11:49
  • @Alxandr - Well, I thought your main problem was converting HTML files. Good to know it was something else. – Lumi Jun 12 '12 at 21:48