-1

I am trying to copy the UID attribute from lsec and replace all subsequent sbsecloc and sbsecanchor attributes.

Input
------------------------------------------------
<lsec uid='copy_1' d='1' n='' anchor='1'>
<name>Normal Text</name>
<p>Normal Text
<lsbsec d='1' sbsecloc='(1)' sbsecanchor='(1)'>
<p>Normat Text</lsbsec>
<lsbsec d='2' sbsecloc='(2)' sbsecanchor='(2)'>
<p>Normat Text</lsbsec>
<lsbsec d='3' sbsecloc='(3)' sbsecanchor='(3)'>
<p>Normat Text</lsbsec>
<lsbsec d='4' sbsecloc='(4)' sbsecanchor='(4)'>
<p>Normat Text</lsbsec>
</lsec>

Output
------------------------------------------------
<lsec uid='copy_1' d='1' n='' anchor='1'>
<name>Normal Text</name>
<p>Normal Text
<lsbsec d='1' sbsecloc='copy_1(1)' sbsecanchor='copy_1(1)'>
<p>Normat Text</lsbsec>
<lsbsec d='2' sbsecloc='copy_1(2)' sbsecanchor='copy_1(2)'>
<p>Normat Text</lsbsec>
<lsbsec d='3' sbsecloc='copy_1(3)' sbsecanchor='copy_1(3)'>
<p>Normat Text</lsbsec>
<lsbsec d='4' sbsecloc='copy_1(4)' sbsecanchor='copy_1(4)'>
<p>Normat Text</lsbsec>
</lsec>

I am using foreach loop to generate the output which is working fine, But when dealing with more than 100 pages data with multiple instances to be replaced it is taking more time.

textBox8.Text = Regex.Replace(textBox8.Text, @"\t|\n|\r", "");
 foreach (int lines in textBox8.Text)
        {
            textBox8.Text = Regex.Replace(textBox8.Text, "<lsec uid='(.*)' d='(.*)' (.*) anchor='(.*)'>(.*)<lsbsec d='(.*)' sbsecloc='(.*)' sbsecanchor='(.*)'>", "<lsec uid='$1' d='$2' $3 anchor='$4'>$5<lsbsec d='$6' loc='$1$7' anchor='$1$8'>");
        }

Above code is replacing the last instance (sbsecloc|sbsecanchor) first.

Is there a better way to replace?

jdweng
  • 33,250
  • 2
  • 15
  • 20
  • 1
    Although your code isn't html directly, you still use tags and thus this answer somewhat applies: https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 – Bauss Nov 06 '17 at 16:30

1 Answers1

0

First of all, when you're processing a large volume of data, you're going to run into slowdown no matter what you do.

However, your real problem is that you are trying to fit a square peg into a round hole. XML isn't a regular language any more than HTML is; regexes cannot handle all the vagaries and edge-cases of SGML and its derivatives.

What you should be doing is to use an XML parser. The System.Xml.Linq namespace should do the trick; just go through every descendant element named "lsbsec", grab the "sbsecloc" and "sbsecanchor" attributes, and prepend the uid to the Values thereof.