-2

I have a text file containing the following lines:

<TestInfo."Content"> 
{
  <Label> "Content" 
  <Visible> "true" 
  "This is the text I want to get" 
}

<TestInfo."Content2"> 
{
  <Label> "Content2" 
  <Visible> "true" 
  "I don't want e.g. this" 
}

I want to extract This is the text I want to get.

I tried e.g. the following:

string tmp = File.ReadAllText(textfile);
string result = Regex.Match(tmp, @"<Label> ""Content"" \n\s+ <Visible> ""true"" \n\s+ ""(.+?)""", RegexOptions.Singleline).Groups[1].Value;

However, in this case I get only the first word.

So, my output is: This

And I have no idea why...

I would appreciate any help. Thanks!

KJSR
  • 1,679
  • 6
  • 28
  • 51
Marc
  • 45
  • 2
  • 3
    What data format is this ? Isn't there parsers that already exist ? To me, regex aren't suited for this – Cid Aug 12 '19 at 12:47
  • You don't need regex. Use Trim to remove spaces and then use StartWith("\"") to get lines starting with double quotes. – jdweng Aug 12 '19 at 13:01
  • I'm not sure on the format of this, but I'm guessing it's similar to xml/html. Regex cannot be used to parse [xml/html](https://stackoverflow.com/a/1732454/1913185). It cannot be used due to the matching tags. – technogeek1995 Aug 12 '19 at 13:03
  • Online [regex tester](https://regex101.com/r/5D2dcy/1) says it's fine. – Jesse de Wit Aug 12 '19 at 13:04
  • 1
    You must be missing `\r`: `@" – Wiktor Stribiżew Aug 12 '19 at 13:28

4 Answers4

0

If you want the entire line after the line that starts with <Visible>, you'd better read the file line by line instead of using File.ReadAllText and a regular expression:

string result;
using (StreamReader sr = new StreamReader(textfile))
{
    while (sr.Peek() >= 0)
    {
        string line = sr.ReadLine();
        if (line.StartsWith("<Visible>"))
        {
            result = sr.ReadLine();
            break;
        }
    }
}
mm8
  • 163,881
  • 10
  • 57
  • 88
0

Try this:

    var tmp = File.ReadAllText("TextFile1.txt");
    var result = Regex.Match(tmp, "This is the text I want to get", RegexOptions.Multiline);
    if (result.Groups.Count> 0)
        for (int i = 0; i < result.Groups.Count; i++)
            Console.WriteLine(result.Groups[i].Value);
    else
        Console.WriteLine("string not found.");

Regards, //jafc

0

You could change your regex this way:

var result = Regex.Match(tmp, @"<Visible> ""true""\s*""([\S ]+)""", RegexOptions.Singleline).Groups[1].Value;

If you want to get all the matches, not only the first one, you could use Regex.Matches

Maksim Simkin
  • 9,561
  • 4
  • 36
  • 49
-1

Thanks a lot for your input! This helped me to find a final solution: First, I extracted only a small part containing the string I want to extract to avoid ambiguities:

string[] tmp = File.ReadAllLines(textfile);
List<string> Content = new List<string>();
bool dumpA = false;
Regex regBEGIN = new Regex(@"<TestInfo\.""Content"">");
Regex regEND = new Regex(@"<TestInfo\.""Content2"">");
foreach (string line in tmp)
{
    if (dumpA)
       Content.Add(line.Trim());
       if (regBEGIN.IsMatch(line))
          dumpA = true;
       if (regEND.IsMatch(line)) break;
}

Then I can extract the (now only once existing) line starting with '"':

string result = "";
foreach (string line in Content)
{
    if (line.StartsWith("\""))
    {
       result = line;
       result = result.Replace("\"", "");
       result = result.Trim();
    }
}
Marc
  • 45
  • 2