1

The end aim... to retrieve a group of DIVs with specific attributes/values and update the element type and manipulate the attribute and value...

I'd like to use RegEx to test the following scenario (I believe is correct based on the above aim)

  • for each opening DIV
  • that contains an attribute-to-check attribute
    • The attribute may or may not be the first attribute
  • that has a value containing value-contains
    • The value may or may not be the first value in the attribute (allowing for future extensions)
  • find its closing DIV
    • This DIV itself may contain other DIVs - including other DIVs that will also be matched

So what have I tried? ...Using the test script:

<p>test1</p>
<DIV attribute-to-check='notvalid'>
  test2
</DIV>
<p>test3</p>
<DIV attribute-to-check='value-contains:2012-12-12' class='test4'>
  test5
</DIV>
<p>test6</p>
<DIV attribute-to-check='value-contains:2012-01-01' class='asd'>
  test6
</DIV>
<p>test7</p>

I've got as far as the following RegEx...

<?DIV.*?attribute-to-check='value-contains:.*?>(.*?)</DIV>

BUT this brings back

string 1: "<DIV attribute-to-check='notvalid'>test2</DIV>test3<DIV attribute-to-check='value-contains:2012-12-12' class='test4'>test5</DIV>"

string 2: "<DIV attribute-to-check='value-contains:2012-01-01' class='asd'>test6</DIV>"

I suspect it's bringing back test2 in string1 as this has an opening div with the attribute-to-check but the value is notvalid? so I'm wondering if I'm getting the first DIV then any number of characters until it locates any other attribute-to-check='value-contains: and then the closing DIV after that?

I am struggling to navigate from this point and would appreciate any helpers from the community :)

Thanks Dylan

  • This could be done fairly simply in jquery. Is that available to you or are you parsing the html outside of the browser? – ilivewithian Oct 02 '12 at 10:19
  • Who is generating these tags? If its you then regex isn't suitable. – Aram Kocharyan Oct 02 '12 at 10:22
  • Avoid lazy quantifiers, they're terribly slow. Rethink your regex completely. Anyway, what's the *first value* in the attribute? Is the attribute value some sort of list? What's its format? – MaxArt Oct 02 '12 at 10:23
  • http://stackoverflow.com/a/1732454/684934 –  Oct 02 '12 at 10:29
  • Needs to be done server side - C#. I don't generate the end-HTML and have no control over it. the attribute value is currently ONLY as shown above - however I want to consider it COULD contain other values going forward (if that over-complicates it I can caveat that out) – Dylan .. Mark Saunders Oct 02 '12 at 11:07

1 Answers1

0

Please, Don't use regex for HTML. No, seriously, don't.

JQuery has awesome selector syntax to do just what you want, really. Look here, where it shows you how to do what you want way at the top of the documentation.

For example:

$('div[attribute-to-check|="value-contains:"]')

Will yield divs with your required attribute/value constraint.

Community
  • 1
  • 1
Tim Lamballais
  • 1,056
  • 5
  • 10
  • Tim - thanks for the response... I've used the JQuery library before and agree that it's very useful - but the end result is for me to perform this update within .Net (I had tagged this incorrectly with JavaScript). I can't parse this as HTML (or XHTML or XML) as the end "string" is not controlled by me and cannot be guaranteed to to be valid in terms of formatting. – Dylan .. Mark Saunders Oct 02 '12 at 10:58
  • Hi Dylan, can you not use a parsing library like http://htmlagilitypack.codeplex.com/ for example? – Tim Lamballais Oct 02 '12 at 11:01
  • Hi Tim, with no access to deployed environment Im stuck with .Net & end page HTML (pre-my processing)... As I have no control over the end page. I've issues with the complexity that could exist in the HTML and the DIVs I need to change anyway ... I am considering an approach of retrieving the DIVs that I am concerned with and then if I wrap them in a container test if they are valid XML - if they are - I can update the necessary node(s), remove the wrapper and replace into the RegEx Match... if they are not I would have to depend on string manipulation with the defence these should be valid – Dylan .. Mark Saunders Oct 02 '12 at 11:06
  • Did you look at the HTML Agility Pack I linked? It's a library for .NET. – Tim Lamballais Oct 02 '12 at 11:11
  • Indeed Tim. Looks great but I can't deploy anything to the existing environment. – Dylan .. Mark Saunders Oct 02 '12 at 13:21
  • Ah, sorry about that. Good luck with your impossible task then :) – Tim Lamballais Oct 03 '12 at 12:01
  • Indeed - thanks Tim. This will clearly be a nightmare (if even possible) when the div I require is in or contains nested
    . So I need to either agree that (unlikely) or agree getting to the content in a parseable format before it's pushed onto a HTML page where the format may not be parseable. Thanks for your comments all the same.
    – Dylan .. Mark Saunders Oct 03 '12 at 15:23