-3

I saw this question already, but I didnt see an answer..

So I get this error:

The ':' character, hexadecimal value 0x3A, cannot be included in a name.

On this code:

    XDocument XMLFeed = XDocument.Load("http://feeds.foxnews.com/foxnews/most-popular?format=xml");
    XNamespace content = "http://purl.org/rss/1.0/modules/content/";

    var feeds = from feed in XMLFeed.Descendants("item")
        select new
        {
            Title = feed.Element("title").Value,
            Link = feed.Element("link").Value,
            pubDate = feed.Element("pubDate").Value,
            Description = feed.Element("description").Value,
            MediaContent = feed.Element(content + "encoded")
        };

    foreach (var f in feeds.Reverse())
    {
        ....
    }

An item looks like that:

<rss>    
<channel>

....items....

<item>
<title>Pentagon confirms plan to create new spy agency</title>
<link>http://feeds.foxnews.com/~r/foxnews/most-popular/~3/lVUZwCdjVsc/</link>
<category>politics</category>
<dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/" />
<pubDate>Tue, 24 Apr 2012 12:44:51 PDT</pubDate>
<guid isPermaLink="false">http://www.foxnews.com/politics/2012/04/24/pentagon-confirms-plan-to-create-new-spy-agency/</guid>
<content:encoded xmlns:content="http://purl.org/rss/1.0/modules/content/"><![CDATA[|http://global.fncstatic.com/static/managed/img/Politics/panetta_hearing_030712.jpg<img src="http://feeds.feedburner.com/~r/foxnews/most-popular/~4/lVUZwCdjVsc" height="1" width="1"/>]]></content:encoded>
<description>The Pentagon confirmed Tuesday that it is carving out a brand new spy agency expected to include several hundred officers focused on intelligence gathering around the world.&amp;amp;#160;</description>
<dc:date xmlns:dc="http://purl.org/dc/elements/1.1/">2012-04-4T19:44:51Z</dc:date>
<feedburner:origLink>http://www.foxnews.com/politics/2012/04/24/pentagon-confirms-plan-to-create-new-spy-agency/</feedburner:origLink>
</item>

....items....

</channel>
</rss>    

All I want is to get the "http://global.fncstatic.com/static/managed/img/Politics/panetta_hearing_030712.jpg", and before that check if content:encoded exists..

Thanks.

EDIT: I've found a sample that I can show and edit the code that tries to handle it..

EDIT2: I've done it in the ugly way:

text.Replace("content:encoded", "contentt").Replace("xmlns:content=\"http://purl.org/rss/1.0/modules/content/\"","");

and then get the element in the normal way:

MediaContent = feed.Element("contentt").Value
Nir
  • 2,497
  • 9
  • 42
  • 71
  • Hmmm, are you allowed to import a namespace on a non-root element? – leppie Apr 25 '12 at 11:51
  • I really dont know xml parsing that well to answer you. But, as I said, this is only one item - there is a and elements above it.. – Nir Apr 25 '12 at 11:55
  • 1
    possible duplicate of [The ':' character, hexadecimal value 0x3A, cannot be included in a name](http://stackoverflow.com/questions/2575546/the-character-hexadecimal-value-0x3a-cannot-be-included-in-a-name) – ChrisF Apr 25 '12 at 11:58
  • Saw that, and tried Vlad's reply - both doesn't work. – Nir Apr 25 '12 at 12:01
  • Possible duplicate of [“The ':' character, hexadecimal value 0x3A, cannot be included in a name”](http://stackoverflow.com/questions/7213251/the-character-hexadecimal-value-0x3a-cannot-be-included-in-a-name). –  Mar 29 '14 at 23:43

2 Answers2

0

You should use XNamespace:

XNamespace content = "...";

// later in your code ...
MediaContent = feed.Element(content + "encoded")

See more details here.

(Of course, you the string to be assigned to content is the same as in xmlns:content="...").

Vlad
  • 35,022
  • 6
  • 77
  • 199
  • @nir: what exactly? `feed.Element`? – Vlad Apr 25 '12 at 12:05
  • In the foreach f.MediaContent, for example if xmlns:content="http://www.site.com" then XNamespace content = "http://www.site.com" – Nir Apr 25 '12 at 12:07
  • @nir: http://msdn.microsoft.com/en-us/library/system.xml.linq.xcontainer.element.aspx: null is returned if the subelement is not found. are you sure your namespace is correct? try to dump the item and see whether the element is there. – Vlad Apr 25 '12 at 12:10
0

The following code

    static void Main(string[] args)
    {

            var XMLFeed = XDocument.Parse(
@"<rss>    
<channel>

....items....

<item>
<title>Pentagon confirms plan to create new spy agency</title>
<link>http://feeds.foxnews.com/~r/foxnews/most-popular/~3/lVUZwCdjVsc/</link>
<category>politics</category>
<dc:creator xmlns:dc='http://purl.org/dc/elements/1.1/' />
<pubDate>Tue, 24 Apr 2012 12:44:51 PDT</pubDate>
<guid isPermaLink='false'>http://www.foxnews.com/politics/2012/04/24/pentagon-confirms-plan-to-create-new-spy-agency/</guid>
<content:encoded xmlns:content='http://purl.org/rss/1.0/modules/content/'><![CDATA[|http://global.fncstatic.com/static/managed/img/Politics/panetta_hearing_030712.jpg<img src='http://feeds.feedburner.com/~r/foxnews/most-popular/~4/lVUZwCdjVsc' height='1' width='1'/>]]></content:encoded>
<description>The Pentagon confirmed Tuesday that it is carving out a brand new spy agency expected to include several hundred officers focused on intelligence gathering around the world.&amp;amp;#160;</description>
<dc:date xmlns:dc='http://purl.org/dc/elements/1.1/'>2012-04-4T19:44:51Z</dc:date>
<!-- <feedburner:origLink>http://www.foxnews.com/politics/2012/04/24/pentagon-confirms-plan-to-create-new-spy-agency/</feedburner:origLink> -->
</item>

....items....

</channel>
</rss>");
            XNamespace contentNs = "http://purl.org/rss/1.0/modules/content/";
            var feeds = from feed in XMLFeed.Descendants("item")
                        select new
                                   {
                                       Title = (string)feed.Element("title"),
                                       Link = (string)feed.Element("link"),
                                       pubDate = (string)feed.Element("pubDate"),
                                       Description = (string)feed.Element("description"),
                                       MediaContent = GetMediaContent((string)feed.Element(contentNs + "encoded"))
                                   };
            foreach(var item in feeds)
            {
                Console.WriteLine(item);
            }
        }

        private static string GetMediaContent(string content)
        {
            int imgStartPos = content.IndexOf("<img");
            if(imgStartPos > 0)
            {
                int startPos = content[0] == '|' ? 1 : 0;

                return content.Substring(startPos, imgStartPos - startPos);
            }

            return string.Empty;
        }

results in:

{ Title = Pentagon confirms plan to create new spy agency, Link = http://feeds.f
oxnews.com/~r/foxnews/most-popular/~3/lVUZwCdjVsc/, pubDate = Tue, 24 Apr 2012 1
2:44:51 PDT, Description = The Pentagon confirmed Tuesday that it is carving out
 a brand new spy agency expected to include several hundred officers focused on
intelligence gathering around the world.&#160;, MediaContent = http://global
.fncstatic.com/static/managed/img/Politics/panetta_hearing_030712.jpg }
Press any key to continue . . .

A few points:

  • You never want to treat Xml as text - in your case you removed the namespace declaration but actually if the namespace was declared inline (i.e. without binding to the prefix) or a different prefix would be defined your code would not work even though semantically both documents would be equivalent
  • Unless you know what's inside CDATA and how to treat it you always want to treat is as text. If you know it's something else you can treat it differently after parsing - see my elaborate on CDATA below for more details
  • To avoid NullReferenceExceptions if the element is missing I used explicit conversion operator (string) instead of invoking .Value
  • the Xml you posted was not a valid xml - there was missing namespace Uri for feedburner prefix

This is no longer related to the problem but may be helpful for some folks so I am leaving it

As far as the contents of the encode element is considered it is inside CDATA section. What's inside CDATA section is not an Xml but plain text. CDATA is usually used to not have to encode '<', '>', '&' characters (without CDATA they would have to be encoded as < > and & to not break the Xml document itself) but the Xml processor treat characters in the CDATA as if they were encoded (or to be more correct in encodes them). The CDATA is convenient if you want to embed html because textually the embedded content looks like the original yet it won't break your xml if the html is not a well-formed Xml. Since the CDATA content is not an Xml but text it is not possible to treat it as Xml. You will probably need to treat is as text and use for instance regular expressions. If you know it is a valid Xml you can load the contents to an XElement again and process it. In your case you have got mixed content so it is not easy to do unless you use a little dirty hack. Everything would be easy if you have just one top level element instead of mixed content. The hack is to add the element to avoid all the hassle. Inside the foreach look you can do something like this:

var mediaContentXml = XElement.Parse("<content>" + (string)item.MediaContent + "</content>");
Console.WriteLine((string)mediaContentXml.Element("img").Attribute("src"));    

Again it's not pretty and it is a hack but it will work if the content of the encoded element is valid Xml. The more correct way of doing this is to us XmlReader with ConformanceLevel set to Fragment and recognize all kinds of nodes appropriately to create a corresponding Linq to Xml node.

Pawel
  • 31,342
  • 4
  • 73
  • 104
  • Well this example is simply paste from an feed. And I dont understand how you suggest to solve my problem with content:encoded – Nir Apr 26 '12 at 09:35
  • I could not find http://www.site.com/image.jpg in your Xml and the title of the question was about namespaces. Anyways, I assume that you are looking for the value of the src attribute of the image element that is in the CDATA section of the encoded element. I updated my response accordingly. – Pawel Apr 26 '12 at 17:01
  • I fix the question few times and didnt update this, I did now. And I dont want whats inside the src, rather I want whats between the '|' and the ' – Nir Apr 26 '12 at 18:02
  • you just need to parse it. I updated the code and the result accordingly – Pawel Apr 26 '12 at 18:48
  • It works, so because it sees only the CDATA in content:encoded it returns the null into MediaContent? And Thanks a lot! – Nir Apr 27 '12 at 13:56