0

First of all I'm new to stackoverflow so I'm sorry if I posted this in the wrong section.

I need a regex to search within the html tag and replace the - with a _ e.g:

<TAG-NAME>-100</TAG-NAME>

would become

<TAG_NAME>-100</TAG_NAME>

note that the value inside the tag wasn't affected.

Can anyone help?

Thanks.

Martin Ender
  • 43,427
  • 11
  • 90
  • 130
  • 4
    Welcome to stackoverflow! Can you show us your own attempts? And which language or tool are you using? And finally the obligatory... [you shouldn't use regex to parse HTML at all](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454) – Martin Ender Nov 20 '12 at 14:22
  • Well I'm using something else to parse some xmls. It's not really an html tag but I thought this would make stuff simpler to understand. I've tried if different reg expressions and tested them on regexpal, but no help, either I'm getting the - of -100, either I'm getting also the text of the tag... – user1839059 Nov 20 '12 at 14:25
  • @user1839059 regular expressions aren't any better for XML than HTML ;). Since you are using regexpal I assume that you use JavaScript? – Martin Ender Nov 20 '12 at 14:26
  • jQuery or anything else? If you want a good answer the technology your application uses is quite important. – Martin Ender Nov 20 '12 at 14:30

2 Answers2

2

Since JavaScript is the language for DOM manipulation, you should generally consider parsing the XML properly and using JavaScript's DOM traversal functions instead of regular expressions.

Here is some example code on how to parse an XML document so that you can use the DOM traversal functions. Then you can traverse all elements and change their names. This will automatically exclude text nodes, attributes, comments and all other annoying things, you don't want to change.

If it has to be a regex, here is a makeshift solution. Note that it will badly fail you if you have tags (or even only >) inside attribute names or comments (in fact it will also apply the replacement to comments):

str = str.replace(/-(?=[^<>]*>)/g, '_');

This will match a - if it is followed by a > without encountering a < before. The concept is called a negative lookahead. The g modifier makes sure that all occurrences are replaced.

Note that this will apply the replacement to anything in front of a >. Even attribute values. If you don't want that you could also make sure that there is an even number of quotes between the hyphen and the closing >, like this:

str = str.replace(/-(?=[^<>"]*(?:"[^<>"]*"[^<>"]*)*>)/g, '_');

This will still change attribute names though.

Here is a regexpal demo that shows what works and what doesn't work. Especially the comment behavior is quite horrible. Of course this could be taken care of with an even more complex regex, but I guess you see where this is going? You should really, really use an XML parser!

Community
  • 1
  • 1
Martin Ender
  • 43,427
  • 11
  • 90
  • 130
0
s/(\<[^\>]+\>)\-([^\<]+\<\/)/\1_\2/

Although I am not familiar with JS libraries, but I am pretty sure there would be better libraries to parse HTML.

SwiftMango
  • 15,092
  • 13
  • 71
  • 136
  • @m.buettner No :( But it's a general regex so JS should have something similar? – SwiftMango Nov 20 '12 at 14:37
  • yeah, the problem with your regex is though that it has to be run multiple times (the `g` modifier won't help) because matches cannot overlap. So only one `-` will be replaced. – Martin Ender Nov 20 '12 at 14:38
  • @m.buettner LOL don't know if OP need multiple replacement. But yeah `g` is needed if need to replace multiple of them. – SwiftMango Nov 20 '12 at 14:40
  • as I said, `g` won't help with your regex, because the matches for multiple hyphens in a single tag would overlap. – Martin Ender Nov 20 '12 at 14:42