2

I have the following xml.

string xmlstring= <z:row ows_Article_x0020_Tags='14;#cricket;#21;#Headlines;#19;#Videos' ows__ModerationStatus='0'      ows__Level='1' ows_Last_x0020_Modified='9;#2013-11-26 01:33:01' ows_ID='9' ows_UniqueId='9;#{FEA534D1-F63B-464D-97DE-     AC60798B72D6}' ows_owshiddenversion='9' ows_FSObjType='9;#0' ows_Created_x0020_Date='9;#2013-11-24 22:59:53'  ows_ProgId='9;#' ows_FileLeafRef='9;#Pablo-Ferrero.aspx' ows_PermMask='0x7fffffffffffffff' ows_Modified='2013-11-26     01:33:01' ows_FileRef='9;#sites/Gaslines/NewsAndEvents/Pages/Pablo-Ferrero.aspx' ows_DocIcon='aspx'     ows_Editor='24;#Harshini P Hegde' />\r\n   
<z:row ows_Article_x0020_Tags='20;#Charity;#14;#cricket' ows__ModerationStatus='0' ows__Level='1'   ows_Last_x0020_Modified='10;#2013-11-26 01:30:11' ows_ID='10' ows_UniqueId='10;#{C8D042AE-466F-44E8-940B-   0C9A64130923}' ows_owshiddenversion='8' ows_FSObjType='10;#0' ows_Created_x0020_Date='10;#2013-11-24 23:01:50'  ows_ProgId='10;#' ows_FileLeafRef='10;#Debra-L-Reed.aspx' ows_PermMask='0x7fffffffffffffff' ows_Modified='2013-11-  26 01:3:10' ows_FileRef='10;#sites/Gaslines/NewsAndEvents/Pages/Debra-L-Reed.aspx' ows_DocIcon='aspx'   ows_Editor='24;#Harshini P Hegde' />\r\n   
<z:row ows_Article_x0020_Tags='' ows__ModerationStatus='3' ows__Level='255' ows_Last_x0020_Modified='13;#2013-11-26     01:45:12' ows_ID='13' ows_UniqueId='13;#{81236BD1-AF3B-4D97-BA14-5492F8013251}' ows_owshiddenversion='5'    ows_FSObjType='13;#0' ows_Created_x0020_Date='13;#2013-11-26 01:28:45' ows_ProgId='13;#'    ows_FileLeafRef='13;#TestTagCloudPage.aspx' ows_PermMask='0x7fffffffffffffff' ows_Modified='2013-11-26 01:45:13'    ows_CheckoutUser='24;#Harshini P Hegde' ows_FileRef='13;#sites/Gaslines/NewsAndEvents/Pages/TestTagCloudPage.aspx'  ows_DocIcon='aspx' ows_Editor='24;#Harshini P Hegde' />\r\n</rs:data>\r\n</xml>"

the above xml also has the following before stringxml

<xml xmlns:s='uuid:BDC6E3F0-6DA3-11d1-A2A3-00AA00C14882'\r\n     xmlns:dt='uuid:C2F41010-65B3-11d1-A29F-00AA00C14882'\r\n     xmlns:rs='urn:schemas-microsoft-com:rowset'\r\n     xmlns:z='#RowsetSchema'>\r\n
<s:Schema id='RowsetSchema'>\r\n   
<s:ElementType name='row' content='eltOnly' rs:CommandTimeout='30'>\r\n      
<s:AttributeType name='ows_Article_x0020_Tags' rs:name='Article Tags' rs:number='1'>\r\n         

I need to get the output as

  string result= 14;#cricket;#21;#Headlines;#19;#Videos;20;#Charity;#14;#cricket

i.e I need the txt lying between

`<z:row ows_Article_x0020_Tags=" and " ows__ModerationStatus=`

I tried using linq . I am not able to to do it. So i want to do it using regex. Is it possible to delete everything else in the string except the result using regex.?

Jinxed
  • 738
  • 7
  • 27

3 Answers3

2

Thus you don't have valid xml here, you can treat this string as html and parse it with HTMLAgilityPack (available from NuGet):

HtmlDocument hdoc = new HtmlDocument();
hdoc.LoadHtml(xmlstring);
var tags = hdoc.DocumentNode.Descendants()
               .Select(r => r.GetAttributeValue("ows_Article_x0020_Tags", ""));

string result = String.Join("", tags);
// 14;#cricket;#21;#Headlines;#19;#Videos20;#Charity;#14;#cricket

With valid xml recommended tool for parsing is LINQ to XML. And parsing should look like:

XDocument xdoc = XDocument.Parse(validXmlString);
XNamespace z = "#RowsetSchema";
var tags = xdoc.Descendants(z + "row")
               .Select(r => (string)r.Attribute("ows_Article_x0020_Tags"));
Sergey Berezovskiy
  • 232,247
  • 41
  • 429
  • 459
  • the tags are null.. The entie xml output is available here..http://pastebin.com/9K8GRZg0 – Jinxed Nov 27 '13 at 09:34
  • Try running a `.Replace()` to remove the literal `\r\n` characters (`validXMLString.Replace("\\r\\n", "")`) – Darkzaelus Nov 27 '13 at 09:39
  • @Jinxed works just fine with your xml, if you wll remove `\r\n` string from it (I believe you have it due to copy-paste from debugger). Also you have `ItemCount=\"8\"` escaped quotes – Sergey Berezovskiy Nov 27 '13 at 09:39
  • 1
    I really donno what happened. But it worked. Thanks a ton.. :)Thank uou so so much.. – Jinxed Nov 27 '13 at 09:47
1

I can't stress how bad an idea it is to extract values from xml using regex, but if you really want to this should work:

        Regex regex = new Regex("ows_Article_x0020_Tags='([^']*)'");
        var matches = regex.Matches(xmlstring);
        Console.WriteLine(matches[0].Groups[1].Value);
        Console.WriteLine(matches[1].Groups[1].Value);
Iain
  • 2,500
  • 1
  • 20
  • 24
  • Thank you for this piece of code. Is it possible to remove everything else from the string by keeping just the values in order to avoid looping.?? – Jinxed Nov 27 '13 at 09:39
0

I generally use LINQ to fetch values from XML, it makes it so much easier.

Example 1: LINQ to read XML

Example 2 : I use below to get a list of Question and Answers for a Quiz App

    public List<QuizQuestions> GetQuiz(int level)
    {
        string docName = "DataModel/Level" + level.ToString() + ".xml";
        XDocument xdoc = XDocument.Load(docName); 
        List<QuizQuestions> book = (from list in xdoc.Descendants("Question")
                                    select new QuizQuestions(list.Element("Quest").Value
                                                             , list.Element("A").Value
                                                             , list.Element("B").Value
                                                             , list.Element("C").Value
                                                             , list.Element("D").Value
                                                             , list.Element("Answer").Value)
                                                             ).OrderBy(a => Guid.NewGuid()).ToList();
        return book;
    }

UPDATE : This will work only with a valid XML

Community
  • 1
  • 1
Saqib Vaid
  • 412
  • 6
  • 23