I have long text files (.srt subtitle files, actually) - which unfortunately include a lot of irrelevant/distracting information.
All irrelevant text is enclosed within identical pairs of pilcrow (paragraph) characters: ¶
So for example, some text would look like this:
This is important, and ¶junk trash garbage rubbish¶ I would like to keep it.
Obviously, I want to remove everything between the ¶ characters and keep the rest. It doesn't matter whether the ¶ characters themselves are stripped or retained: if they're retained, it's trivial just to remove them directly with a subsequent search/replace - so I just need whatever pattern match is easiest.
Note that the ¶ symbols come in identical pairs, so it's not as simple as, for example, stripping out everything between [asymetrical characters].
I'm not working on any particular platform. In fact, I was hoping to use a web-based tool to do it like this one.
I just need the regex - if anyone can assist! Alternatively, if there are better ways than regex, I'd be grateful for suggestions.
Edit: It has been suggested that this question (Remove text in-between delimiters in a string (using a regex?)) answers what I'm looking for. Thanks, but unfortunately it doesn't. That relates to using it in C# (which I don't know), and the answers to that question do not explain exactly how to replicate what I want. I want it to work in the online tool to which I linked.
Update: A good answer works, but only if the unwanted text appears in-line. I also need it to remove text where the entire line is unwanted:
779 00:35:52,216 --> 00:35:54,784
I miss him already.
780 00:36:00,291 --> 00:36:03,727
¶ If you ever need someone ¶
665
00:30:21,821 --> 00:30:25,589
¶ Feels like
sometimes you want to ¶
So I want to remove everything which appears between the ¶ symbols, regardless of where they appeal in the line, and regardless of the presence of line breaks.
Second Update Subsequent to the accepted answer, it seems it's not entirely working. In the example here, the regex provided does not work in the first multi-line instance. I have no clue what's wrong. I just want line breaks (or any other characters) to be irrelevant in the consideration. The request is simply to delete everything between pairs of ¶ characters, regardless of where they appear, and regardless of what else lies between.
Final (hopefully) update
For reference, and thanks to user MDR, we have the solution: (¶[\S\s]*?¶)