Regular Expressions - Select the Second Match

Question

I have a txt file with  and  between words that I would like to remove using Editpad

For example, I'd like to keep when it's like this:

<i>Phrases and words.</i>

And I'd like to remove the  and  tags inside the phrase, when it's like this:

<i>Phrases</i>and<i> words.</i>
<i>Phrases</i>and <i>words.</i>

I was trying to do that using regex, but I couldn't do it.

As the tag is followed by space or a word character I could find when the line has the double tag with

/ <i>|<\/i> /

but this way I can't just press replace for nothing, I have to edit line by line I search.

There's anyway to accomplish that?

* Edited *

Another example of lines found on the subtitle text

<i>- find me on the chamber.</i>
- What? <i>Go. Go, go, go!</i>

score 1 · Accepted Answer · 2017-06-09T02:08:12.053

1

Rule number one: you can't parse html with regex.

That being said, if you know each line follows a certain pattern, you can usually hack something together to work. ;)

If I've understood correctly, it looks like you can simply remove all  and  that aren't either at the beginning or end of the lines. In that case, one method you could try is the following regex:

(?<=.)\<\/?i\>(?=.)

This will match the tags, with a lookahead and behind to make sure that we aren't at the end/start of a line (by checking if another character exists in front/behind. (Note that typically matched characters in a lookahead/behind won't be replaced when you search/replace.)

Disclaimer: this works on regex101, but notepad++ may have some differences to the pcre regex style.

update to work with Editpad

EDIT: since this question is actually wanting to know how to do this in Editpad, below is a modified alternative:

Try searching for the regex: (.)\<\/?i\>(.). This will match (and capture) exactly one character before and after the  tags.

When replacing, use backreferences to replace the entire match with the two captured characters - a replacement string of \1\2 should work.

edited Jun 09 '17 at 02:08

answered Jun 08 '17 at 23:53

Thank you for your reply. It's a subtitle file. Unfortunately it didn't work. I'm using EditPad a similar program to Notepad++. I believe the regex from these programs are the javascript regex style – Commentator Jun 09 '17 at 01:27
@Comentarist why did you tag your question with `notepad++` then? Two alternatives: use notepad++ or another more powerful editor to do this particular operation, or modify this regex to work with javascript style regex (regex101 says lookbehinds aren't in js regex) – Jun 09 '17 at 01:31
I tagged because a code that work on notepad++ might work on my too, which is not the case. Can you modify this regex to work with javascript style? If I could, I would. – Commentator Jun 09 '17 at 01:36
It didn't work properly, but thank you. I believe I can do the job this way almost as manually. Sometimes it is matching 2 captured characters before and the backreference \1\2 replacement eats 1 letter. By my fault I had to change the code to `[^- ](.)\<\/?i\>(.)` because the dialogue lines with `- Text...` was matching – Commentator Jun 09 '17 at 03:19
1

@Comentarist If you are adding the `[^- ]` part to the start, you could put that within the first parentheses to make `([^- ].)...`, so it won't eat that first character. – Jun 09 '17 at 03:37

Regular Expressions - Select the Second Match

* Edited *

1 Answers1

update to work with Editpad