0

I want to find and replace values like:

<TAG>heading<foo></foo></TAG><foo>juergen</foo>

goal:

<TAG>heading</TAG><foo>juergen</foo>

I want to remove the <foo> Tags between <TAG></TAG>

Here is my attempt:

replaceAll("</?foo\\b[^>]*>", "");
Dave Newton
  • 158,873
  • 26
  • 254
  • 302
user1181110
  • 57
  • 1
  • 7
  • 1
    Nice attempt. What's going wrong? – jlordo May 27 '13 at 22:54
  • all foo Tags are deleted. but i need only to delete the tags between – user1181110 May 27 '13 at 22:55
  • 1
    Use an XML Parser for that problem. Regex is not the right tool for that job. – jlordo May 27 '13 at 22:57
  • 2
    You aren't trying to [parse HTML with regex](http://stackoverflow.com/a/1732454/712765), are you? – Old Pro May 27 '13 at 22:58
  • 1
    [Please stop generically whining about things that very often have a perfectly valid use case](http://stackoverflow.com/a/1733489/1729885). Parsing HTML with a regex is sometimes a good idea, sometimes not, stop trying to pass it off as evil by definition. – Niels Keurentjes May 27 '13 at 23:36

3 Answers3

1

Assuming that foo is empty, you can use:

<([^/][^>]*)></\1>

This searches for an opening tag with an adjacent closing tag of the same name.

You could augment it to allow for whitespace in the middle with:

<([^/][^>]*)>\s*</\1>
Dancrumb
  • 26,597
  • 10
  • 74
  • 130
1

Possible duplicate RegEx match open tags except XHTML self-contained tags

Otherwise, here is the regex, do not even ask me to explain, I barely know myself (this is in javascript, some corrections may need to be made for java):

var txt = "<TAG>a<foo>b</foo>c</TAG>d<foo>e</foo>f<TAG>g<foo>h</foo>i</TAG>j<TAG>k</TAG>";
var res = txt.replace(/(<TAG>.*?)<foo>.*?<\/foo>(.*?<\/TAG>)/gm,"$1$2");
//                     (   $1   )               (    $2    )
Community
  • 1
  • 1
Isaac
  • 11,409
  • 5
  • 33
  • 45
1
String result = searchText.replaceAll("(<f.*><.*o>)(?=<)", "");
  • The javascript alternative `txt.replace(/(<.*o>)(?=<)/g,"")` doesn't seem to be removing anything: http://pastebin.com/N2EpQtJ9 – Isaac May 27 '13 at 23:36
  • @Mr.Russian it seems to not like this text: `keep!throwkeep!keep!` – Isaac May 28 '13 at 00:11
  • @Isaac - your example is considerably more complex, but try this `searchText.replace(/(.*<.*o>)(?=(.*(?=\<\/T.*)(?=\<\/)))/gm, "")` – Mr. Russian May 28 '13 at 00:42