1

I'm trying to replace the parenthesis inside a certain tag to just outside of the tag i.e. if there is a opening parenthesis immediately after the tag or a closing parenthesis immediately before the closing tag. Example:

<italic>(When a parenthetical sentence stands on its own)</italic>
<italic>(When a parenthetical sentence stands on its own</italic>
<italic>When a parenthetical sentence stands on its own)</italic>

Those lines should be after replace:

(<italic>When a parenthetical sentence stands on its own</italic>)
(<italic>When a parenthetical sentence stands on its own</italic>
<italic>When a parenthetical sentence stands on its own</italic>)

However, strings like the the next three below should stay untouched.

<italic>(When) a parenthetical sentence stands on its own</italic>
<italic>When a parenthetical sentence stands on its (own)</italic>
<italic>When a parenthetical sentence stands (on) its own</italic>

But the following strings:

<italic>((When) a parenthetical sentence stands on its own</italic>
<italic>((When) a parenthetical sentence stands on its own)</italic>
<italic>(When) a parenthetical sentence stands on its own)</italic>
<italic>When a parenthetical sentence stands on its (own))</italic>
<italic>(When a parenthetical sentence stands on its (own)</italic>

should be after the replace(s):

(<italic>(When) a parenthetical sentence stands on its own</italic>
(<italic>(When) a parenthetical sentence stands on its own</italic>)
<italic>(When) a parenthetical sentence stands on its own</italic>)
<italic>When a parenthetical sentence stands on its (own)</italic>)
(<italic>When a parenthetical sentence stands on its (own)</italic>

There could be nested tags inside the <italic>...</italic> tags and a line can contain multiple <italic>...</italic> strings. Also if there is a nested tag <inline-formula>...</inline-formula> inside <italic>...</italic> then those should be ignored.

Can I do this using regex? If not what other way can I do this?

My approach is this (I am still not sure if it covers all possible cases):

1st step: <italic>( ---> (<italic> find <italic>( if the tag is not followed by a matching pair of parenthesis immediately not followed by a closing tag The match is allowed only within a single line.

Find what: (<(italic)>)(?!(\((?>(?:(?![()\r\n]).)++|(?3))*+\))(?!</$2\b))(\() Replace with: $4$1

2nd step: )</italic> ---> </italic>) find )</italic> if the tag is not preceded by a matching pair of parenthesis immediately not preceded by an opening tag The match is allowed only within a single line.

(\))(?<!(?<!<(italic)>)(\((?>(?:(?![()\r\n]).)++|(?3))*+\)))(</2\b>)

Don_B
  • 243
  • 2
  • 15
  • 1
    what have you tried? Show us some code please. – nilsK Dec 07 '17 at 16:08
  • @nilsK check the updated question – Don_B Dec 07 '17 at 16:12
  • what about `((When a parenthetical sentence stands on its own))`. Do we skip the replacing because there are already parentheses outside? – Flater Dec 07 '17 at 16:19
  • @Flater `((When a parenthetical sentence stands on its own))` should be replaced to `((When a parenthetical sentence stands on its own))` – Don_B Dec 07 '17 at 16:21
  • Please makes sure to read https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/ before going that route. – Alexei Levenkov Dec 07 '17 at 16:23

1 Answers1

1

You could do this a few different ways, I would start by defining when a tag is replaceable.

  1. We can replace the opening tag if the text in the tag starts with ( and is either closed right before the closing tag, or is unclosed
  2. We can replace the closing tag if the text in the tag ends with ) and it was opened right after the opening tag, or it was unopened

This problem seems like it lends itself to a parser approach and keeping track of the parenthesis state (was there a parenthesis at the beginning of the tag text, and how nested are the parenthesis at the current point). Writing a parser would let us do the replacement in a constructive manner as opposed to searching with a regex, and replacing substrings and would be naturally recursive which would handle nesting. Doing this with a regex seems a bit convoluted. Here's what I came up with.

using System;
using System.IO;
using System.Text;

namespace ParenParser {
    public class Program
    {
        public static Stream GenerateStreamFromString(string s)
        {
            MemoryStream stream = new MemoryStream();
            StreamWriter writer = new StreamWriter(stream);
            writer.Write(s);
            writer.Flush();
            stream.Position = 0;
            return stream;
        }

        public static String Process(StreamReader s) { // root
            StringBuilder output = new StringBuilder();
            while (!s.EndOfStream) {
                var ch = Convert.ToChar(s.Read());
                if (ch == '<') {
                    output.Append(ProcessTag(s, true));
                } else {
                    output.Append(ch);
                }
            }

            return output.ToString();
        }

        public static String ProcessTag(StreamReader s, bool skipOpeningBracket = true) {
            int currentParenDepth = 0;
            StringBuilder openingTag = new StringBuilder(), allTagText = new StringBuilder(), closingTag = new StringBuilder();
            bool inOpeningTag = false, inClosingTag = false;
            if (skipOpeningBracket) {
                inOpeningTag = true;
                openingTag.Append('<');
                skipOpeningBracket = false;
            }

            while (!s.EndOfStream) {
                var ch = Convert.ToChar(s.Read());
                if (ch == '<') { // start of a tag
                    var nextCh = Convert.ToChar(s.Peek());
                    if (nextCh == '/') { // closing tag!
                        closingTag.Append(ch);
                        inClosingTag = true;
                    } else if (openingTag.ToString().Length != 0) { // already seen a tag, recurse
                        allTagText.Append(ProcessTag(s, true));
                        continue;
                    } else {
                        openingTag.Append(ch);
                        inOpeningTag = true;
                    }
                }
                else if (inOpeningTag) {
                    openingTag.Append(ch);
                    if (ch == '>') {
                        inOpeningTag = false;
                    }
                }
                else if (inClosingTag) {
                    closingTag.Append(ch);
                    if (ch == '>') {
                        // Done!
                        var allTagTextString = allTagText.ToString();
                        if (allTagTextString.Length > 0 && allTagTextString[0] == '(' && allTagTextString[allTagTextString.Length - 1] == ')' && currentParenDepth == 0) {
                            return "(" + openingTag.ToString() + allTagTextString.Substring(1, allTagTextString.Length - 2) + closingTag.ToString() + ")";
                        } else if (allTagTextString.Length > 0 && allTagTextString[0] == '(' && currentParenDepth > 0) { // unclosed
                            return "(" + openingTag.ToString() + allTagTextString.Substring(1, allTagTextString.Length - 1) + closingTag.ToString();
                        } else if (allTagTextString.Length > 0 && allTagTextString[allTagTextString.Length - 1] == ')' && currentParenDepth < 0) { // unopened
                            return openingTag.ToString() + allTagTextString.Substring(0, allTagTextString.Length - 1) + closingTag.ToString() + ")";
                        } else {
                            return openingTag.ToString() + allTagTextString + closingTag.ToString();
                        }
                    }
                }
                else
                {
                    allTagText.Append(ch);
                    if (ch == '(') {
                        currentParenDepth++;
                    }
                    else if (ch == ')') {
                        currentParenDepth--;
                    }
                }
            }

            return openingTag.ToString() + allTagText.ToString() + closingTag.ToString();
        }

        public static void Main()
        {
            var testCases = new String[] {
                // Should change
                "<italic>(When a parenthetical sentence stands on its own)</italic>",
                "<italic>(When a parenthetical sentence stands on its own</italic>",
                "<italic>When a parenthetical sentence stands on its own)</italic>",

                // Should remain unchanged
                "<italic>(When) a parenthetical sentence stands on its own</italic>",
                "<italic>When a parenthetical sentence stands on its (own)</italic>",
                "<italic>When a parenthetical sentence stands (on) its own</italic>",

                // Should be changed
                "<italic>((When) a parenthetical sentence stands on its own</italic>",
                "<italic>((When) a parenthetical sentence stands on its own)</italic>",
                "<italic>(When) a parenthetical sentence stands on its own)</italic>",
                "<italic>When a parenthetical sentence stands on its (own))</italic>",
                "<italic>(When a parenthetical sentence stands on its (own)</italic>",

                // Other cases
                "<italic>(Try This on!)</italic>",
                "<italic><italic>(Try This on!)</italic></italic>",
                "<italic></italic>",
                "",
                "()",
                "<italic>()</italic>",
                "<italic>"
            };

            foreach(var testCase in testCases) {
                using(var testCaseStreamReader = new StreamReader(GenerateStreamFromString(testCase))) {
                    Console.WriteLine(testCase + " --> " + Process(testCaseStreamReader));
                }
            }
        }
    }
}

The test case results look something like

<italic>(When a parenthetical sentence stands on its own</italic> --> (<italic>When a parenthetical sentence stands on its own</italic>
<italic>When a parenthetical sentence stands on its own)</italic> --> <italic>When a parenthetical sentence stands on its own</italic>)
<italic>(When) a parenthetical sentence stands on its own</italic> --> <italic>(When) a parenthetical sentence stands on its own</italic>
<italic>When a parenthetical sentence stands on its (own)</italic> --> <italic>When a parenthetical sentence stands on its (own)</italic>
<italic>When a parenthetical sentence stands (on) its own</italic> --> <italic>When a parenthetical sentence stands (on) its own</italic>
<italic>((When) a parenthetical sentence stands on its own</italic> --> (<italic>(When) a parenthetical sentence stands on its own</italic>
<italic>((When) a parenthetical sentence stands on its own)</italic> --> (<italic>(When) a parenthetical sentence stands on its own</italic>)
<italic>(When) a parenthetical sentence stands on its own)</italic> --> <italic>(When) a parenthetical sentence stands on its own</italic>)
<italic>When a parenthetical sentence stands on its (own))</italic> --> <italic>When a parenthetical sentence stands on its (own)</italic>)
<italic>(When a parenthetical sentence stands on its (own)</italic> --> (<italic>When a parenthetical sentence stands on its (own)</italic>
<italic>(Try This on!)</italic> --> (<italic>Try This on!</italic>)
<italic><italic>(Try This on!)</italic></italic> --> (<italic><italic>Try This on!</italic></italic>)
<italic></italic> --> <italic></italic>
 --> 
() --> ()
<italic>()</italic> --> (<italic></italic>)
<italic> --> <italic>
ameer
  • 2,598
  • 2
  • 21
  • 36
  • I'm getting an error on the line `while (!s.)` **Identifier expected** – Don_B Dec 08 '17 at 01:49
  • @Don_B should be while (!s.EndOfStream), not sure how it got cut off, updated – ameer Dec 08 '17 at 01:53
  • how do I do this in files in a path i.e. how do I combine the below codes `string path=@"D:\Test"; string[] files=Directory.GetFiles(path,"*.xml"); foreach (var file in files) { string testCases=File.ReadAllText(file); } foreach(var testCase in testCases) { using(var testCaseStreamReader = new StreamReader(GenerateStreamFromString(testCase))) { Process(testCaseStreamReader); } }` – Don_B Dec 08 '17 at 02:05
  • You can construct a StreamReader around a FileStream, [File.Open(...](https://msdn.microsoft.com/en-us/library/system.io.file.openread(v=vs.110).aspx) returns you a FileStream from a given path. – ameer Dec 08 '17 at 02:12
  • do you mind updating your code to do that, I'm not that good at using `Streams`.... – Don_B Dec 08 '17 at 02:16
  • `foreach (var file in files) { using(var fileStream = File.OpenRead(file)) { using(var fileStreamReader = new StreamReader(fileStream)) { Process(fileStreamReader); }}}` – ameer Dec 08 '17 at 02:22
  • Yes, you will have to save the returned string from Process back to a file. You can do something like `File.WriteAllText(file + ".processed", Process(fileStreamReader))`. I'm just trying to provide you with an approach for the replacement, I'd recommend you look up or research more into file streams if you need more help with them as I think that's outside the scope of the question. – ameer Dec 08 '17 at 02:36