1

I have a string which basically contains an XML file, with tags and everything. One tag I'm particularly interested in, is <address source>. The thing is, the XML file isn't always the same. Sometimes there might just be one tag <address source> present, sometimes there may be 5 of them present, and sometimes even 20 or more.

So, imagine my string is something like this:

string XMLToAnalyze = "<XML><TAG1>somecontent</TAG1><address source>content</address source><address source>content</address source><TAG2>morecontent</TAG2></XML>"`

So, in this particular string, there's two times the tag <address source>.

What I need, is this:

I need to find the Index (or IndexOf) of each tag <address source>, and I need these indexes stored separately, preferably truly in separate integers (one integer per index), or alternatively in an array. This is because I'll need to access each separate integer to fill in some fields in a Winforms form.

Is this possible?

Erik Philips
  • 53,428
  • 11
  • 128
  • 150

2 Answers2

1

What you need is a IDictionary<[string], IList<[int]>>. As you search for an opening tags, you can store them in the dictionary in the following way: If the tag exists, add the new found index to the added list, otherwise add a new element into the dictionary with a new list with a single item in it - the first occurance of the index. After you'll go over all the string - your dictionary will have the map you are looking for.

public static class XmlTagMapBuilder
{
    public static IDictionary<string, IList<int>> GetOpenTagIndexMap(string inputXml)
    {
        // Argument validation goes here

        IDictionary<string, IList<int>> result = new Dictionary<string, IList<int>>();

        int currentIndex = -1;
        string lastOpenTag = null;
        while (true)
        {
            string nextOpenTagName;
            int nextOpenTagIndex;
            if (TryGetNextOpenTagIndex(inputXml, currentIndex, out nextOpenTagName, out nextOpenTagIndex))
            {
                lastOpenTag = nextOpenTagName;
                currentIndex = nextOpenTagIndex;

                IList<int> tagIndicies;
                if (!result.TryGetValue(nextOpenTagName.ToUpperInvariant(), out tagIndicies))
                {
                    tagIndicies = new List<int>();
                    result.Add(nextOpenTagName, tagIndicies);
                }

                tagIndicies.Add(nextOpenTagIndex);
            }
            else
            {
                break;
            }
        }

        return result;
    }

    /// <summary>
    /// Tries to get next open tag in the given <see cref="inputXml"/> string after the specified startIndex.
    /// </summary>
    /// <param name="inputXml">The string which contains the xml tags.</param>
    /// <param name="startIndex">The index after which to look for the open tag.</param>
    /// <param name="nextOpenTagName">If a tag was found, contains its name.</param>
    /// <param name="nextOpenTagIndex">If a tag was found, contains the start index of it.</param>
    /// <returns>true - if the tag was found. false - otherwise.</returns>
    private static bool TryGetNextOpenTagIndex(string inputXml, int startIndex, out string nextOpenTagName, out int nextOpenTagIndex)
    {
        // Need to add implementaiton here
    }
}
Mark Brackett
  • 84,552
  • 17
  • 108
  • 152
Artak
  • 2,819
  • 20
  • 31
1

Use Regex to match all the strings and then loop through the matches and find the indexes of each match.

This logic must work. This has been Tested

        List<int> indexes = new List<int>();
        string XMLToAnalyze = "<XML><TAG1>somecontent</TAG1><address source>content</address source><address source>content</address source><TAG2>morecontent</TAG2></XML>";
        var regex = new Regex(@"<address source>");

        foreach (Match match in regex.Matches(XMLToAnalyze))
        {
            indexes.Add(match.Index);
        }

The indexes will have all the indexes of the matched string.


OutPut : 29, 69

Rajshekar Reddy
  • 18,647
  • 3
  • 40
  • 59
  • Thanks, this is familiar code to me. One question though, how would I display the `indexes` within a textBox? – SomebodyWithAQuestion Mar 26 '16 at 18:24
  • are you using Webforms? Mvc? or something else? where do you have the text box – Rajshekar Reddy Mar 26 '16 at 18:26
  • Windows Forms is what I'm using. But I think I found it already: I used `string.join(environment.newline, indexes)` to display the indexes in a multiline field. Another question, though: rather new to lists: how would I extract, for example, the second item in the list? – SomebodyWithAQuestion Mar 26 '16 at 18:35
  • by using index. say like List[0] will give you the first item in the list, where `List` is a variable of type `List<>` – Rajshekar Reddy Mar 26 '16 at 18:42
  • @SomebodyWithAQuestion If this solution fixed your issue please mark it as answer as it can prove useful for future audience – Rajshekar Reddy Mar 26 '16 at 18:45
  • 1
    @Reddy It is not advisable to use regex on XML http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags?lq=1 – kranthiv Mar 26 '16 at 19:05
  • @kranthiv I understand but the scenario here is different. Its not Parsing its just string matching. – Rajshekar Reddy Mar 26 '16 at 19:10