0

For example i have this:

"Was? Wo war ich? Ach ja.<pa>">

I need to create a new text file that will contain only:

Was? Wo war ich? Ach ja.

And i have a big file like 43mb and i need to scan all over the file and get only the places that start with " and end with <pa>" and to get the string between this tags.

I did this code so far:

private void retrivingTestText()
        {
            w = new StreamWriter(retrivedTextFile);
            string startTag = "\"";
            string endTag = "&lt;pa&gt;";
            int startTagWidth = startTag.Length;
            int endTagWidth = endTag.Length;
            string text = "\"Was? Wo war ich? Ach ja.&lt;pa&gt;\">";

            int begin = text.IndexOf(startTag);
            int end = text.IndexOf(endTag, begin + 1);

            string result = text.Substring(begin+1, end-1);
            w.WriteLine(result);
            w.Close();


        }

But now i need to make it on a big file 43mb xml file. So in the constructor i already did StreamReader r; And string f; Then i did :

r = new StreamReader(@"D:\New folder (22)\000004aa.xml")
f = r.ReadToEnd();

Now i need to use it with the code above to extract all the strings in the big file between the startTag and endTag and not only specific text.

Second thing i need to make another function so after i make changes it will know to add back all the extractes text strings to the right places where it was before between the startTag and the endTag

Thanks.

user1352869
  • 669
  • 1
  • 5
  • 6
  • 2
    You may want to look into regular expressions... they make this sort of thing much easier. – Cameron Apr 24 '12 at 05:33
  • Is your data valid XML? Can you use an XmlReader to process the file? See http://msdn.microsoft.com/en-us/library/9d83k261.aspx – Andrew Kennan Apr 24 '12 at 05:39

2 Answers2

0

There is a similar post on how to remove HTML tags using regular expressions. Here is the link.

And another one that you can tweak, here.

Community
  • 1
  • 1
Robin Maben
  • 22,194
  • 16
  • 64
  • 99
0

You can go for following approach to extract the data.

string word = "\"Was? Wo war ich? Ach ja<pa>\"Jain\"Romil<pa>\"";
string[] stringSeparators = new string[] { "<pa>\"" };
string ans=String.Empty;
string[] text = word.Split(stringSeparators, StringSplitOptions.None);

foreach (string s in text)
{
    if (s.IndexOf("\"") >= 0)
    {
        ans += s.Substring(s.IndexOf("\"")+1);
    }
}
return ans;
Romil Kumar Jain
  • 20,239
  • 9
  • 63
  • 92