Hey everyone,
I'm off on another coding adventure. I started teaching myself some basic RegEx earlier today, and made a little C# app that inputs a HTML file and a listbox of RegExes, then uses those RegExes to replace or remove HTML tags.
I managed to make some functioning RegExes to clean and remove tags littering the tables, but I also need to remove the mess of hard-coded css styles and replace them with references to external ones.
After a lot of trial and error, I finally came up with something that selects from <style type="text/css">
to </style>
but for some reason it completely skips over separate blocks of style tags. It stops at the closing of the last one, though.
This is more of a curiosity than a needed bit of information, this should work fine for now because I can just replace what is matched with a single <link>
to the external css.
As of right now, my RegEx is this:
<style((\s+\w+(\s*=\s*(?:".*?"|'.*?'|[^'">\s]+))?)+\s*|\s*)>(.*?\r\n)*(</style>)
The first half was taken from here, the middle bit was what I struggled most with, as I had forgotten about \r\n, and of course the closing tag was verbatim.
Like I said, this works fine, my only qualm is that of this code:
<style type="text/css">
<!--
#wrapper #content #main2col .modbox tr td {
color: #3366cc;
border-top-style: solid;
border-right-style: solid;
border-bottom-style: solid;
border-left-style: solid;
}
#wrapper #content #main2col .modbox tr td p em {
color: #0a304e;
}
#wrapper #content #main2col .modbox tr td em br {
color: #0a304e;
}
#wrapper #content #main2col .modbox tr td em strong {
color: #0a304e;
}
#wrapper #content #main2col p strong {
color: #0a304e;
}
#wrapper #content #main2col table tr td strong {
color: #0a304e;
}
-->
</style>
<style type="text/css">
<!--
table.modbox {
font-size:9pt;
font-HCMmily:"Calibri", "sans-serif";
border-top-style: solid;
border-right-style: solid;
}
p.modbox {
margin-top:0in;
margin-right:0in;
margin-bottom:10.0pt;
margin-left:0in;
line-height:normal;
font-size:11.0pt;
font-HCMmily:"Calibri", "sans-serif";
}
#wrapper #content #main2col .modbox tr .modbox {
color: #09C;
font-style: normal;
}
#wrapper #content #main2col .modbox {
color: #3366cc;
}
#wrapper #content #main2col .modbox {
color: #3a5774;
}
#wrapper #content #main2col .modbox tr .modbox .MsoNormal .modbox {
color: #3a5774;
}
#wrapper #content #main2col .modbox {
color: #3a5774;
}
-->
</style>
<style type="text/css">
<!--
table.MsoTableGrid {
border:solid;
font-size:11.0pt;
font-HCMmily:"Calibri", "sans-serif";
}
p.MsoNormal {
margin-top:0in;
margin-right:0in;
margin-bottom:5pt;
margin-left:0in;
line-height:normal;
font-size:10pt;
font-HCMmily:"Calibri", "sans-serif";
}
-->
</style>
<style type="text/css">
<!--
table.modbox {
font-size:10.0pt;
font-family:"Times New Roman","serif";
}
-->
</style>
Only one match is returned. I'm trying to figure out why it doesn't catch the fist close tag of </style>
. For the record, I tried adding (\r\n)? after the close tag bit, but that made no difference.
Again, today was my first day working with RegEx, so I'm really new to this, I could be making a very simple mistake.
Can anyone explain what I've done wrong? Any assistance is greatly appreciated!