Regex to find immediate duplicate tag

Question

I am using Notepad++, wherein I have to find and remove the immediate duplicate HTML tag which is shown below

Actual

<a href="www.google.com"><a href="www.google.com">www.google.com</a></a>

Required

<a href="www.google.com">www.google.com</a>

I have a regex to find duplicates which comes in new line, but my search will be with in a line.

Pl help me

Somebody had to do it: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags — squiguy, May 06 '13 at 05:57
@squiguy the OP isn't parsing HTML, simply trying to match a pattern in a text file for replacement in the program. — AbsoluteƵERØ, May 06 '13 at 06:08
@AbsoluteƵERØ - You are right but you have to admit that after all this time, it's still so much fun to read . — Lieven Keersmaekers, May 06 '13 at 06:22

score 2 · Answer 1 · answered May 06 '13 at 06:06

2

Find:

(<(\w+)(\s[^>]*)?>)\1(.*)(<\/\2>)\5

Replace:

\1\4\5

Tested in Sublime.

answered May 06 '13 at 06:06

Albert Xing

Casimir et Hippolyte · Answer 2 · 2013-05-06T06:22:52.197

1

For this kind of "double links" you can use this:

find: <(a [^>]+)>(<\1>.*?</a>)</a>
replace: \2

For all tags use:

find: <((\w+)[^>]*)>(<\1>.*?</\2>)</\2>
replace: \3

(the two with a recent version of notepad++)

edited May 06 '13 at 06:22

answered May 06 '13 at 06:10

score 1 · Answer 3 · answered May 06 '13 at 07:20

1

Search Pattern:

.*">(<.*>)<\/a>

Replace:

\1

answered May 06 '13 at 07:20

Dick Faps

Civa · Answer 4 · 2013-05-06T06:12:59.847

0

Try this pattern

(<(\w+)(\s[^>]*)?>)(\s|\n|\t)*\1(.*)(<\/\2>)(\s|\n|\t)*\6

replace \1 and \6

edited May 06 '13 at 06:12

answered May 06 '13 at 06:01

Civa

This doesn't keep the data between the tags in Notepad++. It does get rid of the duplicate tag though. Should say replace \1\5\6. – AbsoluteƵERØ May 06 '13 at 06:20

4 Answers4