0

I have an htmlstring that holds multiple input tags. I need to identify 3 groups in that string and replace these captured groups inside the string in a specific order.

here an example of an non valid string:

<input style="BORDER-BOTTOM: 0px; TEXT-ALIGN: center; BORDER-LEFT: 0px; PADDING-BOTTOM: 0px; BACKGROUND-COLOR: #fff6b7; MARGIN: 0px; PADDING-LEFT: 0px; PADDING-RIGHT: 0px; FONT-SIZE: 10px; BORDER-TOP: 0px; BORDER-RIGHT: 0px; PADDING-TOP: 0px" onkeyup=this.value=this.name.substring(0,9); name=smartTag_Campaign_Date value=Campaign_Date size=18>

The attributes name, value and size need to be in the same string but in a different order as size, value and name.

I can't use html parser, unfortunately and therefore i need to stick to a regex expressions that i can't figure out myself.

Any ideas?

Bart Kiers
  • 166,582
  • 36
  • 299
  • 288
byte_slave
  • 1,368
  • 1
  • 12
  • 24
  • wait a second... why do they need to be reordered? – Joseph Marikle Nov 05 '11 at 19:52
  • It's a monkey patch kinda thing i need to use in a trigger in tsql, so that the markup gets reorder as i asked, basically because for the sake of IE8 and 9 compatibility....its an existing system, i can't touch code, only can do the fix in the db...otherwise ofc i would be using html parsers or so. The way this system looks to these values, is using string replace at position x, and if elements aren't in the order i asked, the whole thing goes bananas – byte_slave Nov 05 '11 at 19:58

2 Answers2

0

That sort of thing is practically impossible with REGEX. Give it up. Don't try it if you don't want the unholy child weep the blood of virgins.

As far as I can see, a DOM/[X]HTML parser is your only option.

Community
  • 1
  • 1
Madara's Ghost
  • 172,118
  • 50
  • 264
  • 308
  • Well, I was foreseeing i could extract the groups with style, onkeyup, name, value and size groups and then rebuild the input tag using the attributes in the correct position.. – byte_slave Nov 05 '11 at 20:00
  • And what if you had to expand? What if you had a different order then the one you were expecting? You can't, and shouldn't do this one with REGEX. Trust me. – Madara's Ghost Nov 05 '11 at 20:01
  • I trust you! But after some hours of extensive testing this is what is happening and its consistent. The html string I wrote is never edited manually by anyone, is a field that the user selects from a RAD editor and then it will generate that markup, so no big problem seeing that way. But i agree with you... i know is horrible, but... sometimes we just need to do some mcgyver tricks :) – byte_slave Nov 05 '11 at 20:08
0

A really simple/basic solution is to use regexs of the form below to capture each of your groups separately. Basically it looks for the name of the attribute, captures any (represented by dot) characters after it, until it finds either a closing bracket or a single space. Note that these are very simplistic, and would need to be modified to compensate for legal changes in the html format.... such as spaces on either side of the equals sign. But it's a start. Regexr.com is a nice tool for building and testing regexs. The right side gives you a library of components to pick from with definitions of what they mean in a regex.

As stated by Truth, this isn't a very flexible/scalable/proper way to do this type of thing, but it does get the job done depending on your needs.

value=(.+)[\s>]
name=(.+)[\s>]
size=(.+)[\s>]

So you can get a little more familiar with regexs, more explanation of each part below:

attribute= matches the exact name of the attribute followed by an equal sign
(.+) The dot represents any character (watch out for line breaks...), + tells it to look for 1 or more of them in the sequence. Parentheses are used to capture the group.
[\s>] character class containing a literal > character, and space signified by /s

Bryan
  • 2,191
  • 20
  • 28