1

Below is a code in C# for rss reader, why this code is bad? this class generates a list of the 5 most recent posts, sorted by title. What do you use to analyze code in C#?

    static Story[] Parse(string content)
    {
        var items = new List<string>();
        int start = 0;
        while (true)
        {

            var nextItemStart = content.IndexOf("<item>", start);
            var nextItemEnd = content.IndexOf("</item>", nextItemStart);
            if (nextItemStart < 0 || nextItemEnd < 0) break;

            String nextItem = content.Substring(nextItemStart, nextItemEnd + 7 - nextItemStart);
            items.Add(nextItem);
            start = nextItemEnd;
        }

        var stories = new List<Story>();
        for (byte i = 0; i < items.Count; i++)
        {
            stories.Add(new Story()
            {
                title = Regex.Match(items[i], "(?<=<title>).*(?=</title>)").Value,
                link = Regex.Match(items[i], "(?<=<link>).*(?=</link>)").Value,
                date = Regex.Match(items[i], "(?<=<pubDate>).*(?=</pubdate>)").Value
            });
        }

        return stories.ToArray();
    }
Sofic
  • 57
  • 1
  • 6
  • 1
    Could you please describe what you mean by "bad"? Although I can say one thing; why not use LINQ to XML to parse this? – Andrew Barber Sep 26 '12 at 00:55
  • bad means a garbage code or has stylistic, functionality, maintainability problems. – Sofic Sep 26 '12 at 00:56
  • Something can not really be good or bad without a metric of some sort. This code could be done in several ways but without understanding what you are trying to maximize it's hard to understand what to improve. – madmik3 Sep 26 '12 at 00:56
  • what is the disadvantage of using code like this and not the XmlReader or LINQ to Xml? – Sofic Sep 26 '12 at 00:59
  • 1
    For the question on what to use to analyze C# code, I would recommend (1) your experience and (2) your intellect. – bryanmac Sep 26 '12 at 01:03
  • 1
    possible duplicate of [RegEx match open tags except XHTML self-contained tags](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags) - there are already plenty of answers why one should not be using Regex to parse HTML/XHTML/XML... – Alexei Levenkov Sep 26 '12 at 01:24

6 Answers6

4

why don't use XmlReader or XmlDocument or LINQ to Xml?

yukaizhao
  • 681
  • 5
  • 17
  • +1 - exactly - parsers we're designed for problems it looks like that code is trying to address. Also, doing RegEx matches inside a order n byte loop may be expensive. Having magic number like 7 that happens to align with a string constant further up in the code etc... – bryanmac Sep 26 '12 at 01:00
3

It is bad because it's using string parsing when there are excellent classes in the framework for parsing XML. Even better, there are classes to deal with RSS feeds.

ETA:

Sorry to not have answered your second question earlier. There are a great number of tools to analyze correctness and quality of C# code. There's probably a huge list compiled somewhere, but here's a few I use on a daily basis to help ensure quality code:

  • StyleCop (code formatting standards)
  • Resharper (idiomatic programming, gotcha catching)
  • FxCop (code correctness, standards adherence, idiomatic programming)
  • Pex (white box testing)
  • Nitriq (code quality metrics)
  • NUnit (unit testing)
Jesse C. Slicer
  • 19,901
  • 3
  • 68
  • 87
1

You shouldn't parse XML with string functions and regular expressions. XML can get very complicated and be formatted many ways that a real XML parser like XmlReader can handle, but will break your simple string parsing code.

Basically: don't try and reinvent the wheel (an xml parser), especially when you don't realize how complicated that wheel actually is.

PherricOxide
  • 15,493
  • 3
  • 28
  • 41
1

I think the worst thing for the code is the performance issue. You should parse the xml string into a XDocument(or similar structure) instead of parse it again and again using regex.

Cheng Chen
  • 42,509
  • 16
  • 113
  • 174
1

For starters, it uses byte as a indexer instead of int (what if items has more items in it than a byte can represent?). It doesn't use idiomatic C# (see user1645569's response). It also unnecessarily uses var instead of specific data types (that's more stylistic though, but for me I prefer not to, hence by my metric it's not ideal (and you've given no other metric)).

Let me clarify what I am saying about "unnecessarily using var": var in and of itself is not bad, and I am not suggesting that. I am (mostly) suggesting the usage here isn't very consistent. For example, explicitly declaring start as an int, but then declaring nextItemEnd as var (which will deduce to int) and assigning nextItemEnd to start seems (to me) like a weird mixture between wanting to automatically deduce a variable's type and explicitly declaring it. I think it's good that var was not used in start's declaration (because then it's not exactly clear if the intent is an integer or a floating point number), but I (personally) don't think it helps any to declare nextItemStart and nextItemEnd as var. I tend to prefer to use var for more complex/longer data types (similar to how I use auto in C++ for iterators, but not for "simpler" data types).

Cornstalks
  • 37,137
  • 18
  • 79
  • 144
  • Explanation for the -1, please? – Cornstalks Sep 26 '12 at 01:01
  • 3
    I haven't down voted, but using var for local variables is recommended – Sleiman Jneidi Sep 26 '12 at 01:08
  • @sleimanjneidi: Thanks, perhaps that's why. I'll clarify to say that `var` in and of itself isn't bad. I've got nothing against `var`; I just don't think it was used great here (if he declares `start` as `int`, and then `nextItemEnd` as `var`, and then assigns `nextItemEnd` to `start`, I would personally prefer him to be consistent with explicitly declaring types instead of switching between `int` and `var`). – Cornstalks Sep 26 '12 at 01:33
1

simply because you are reinventing a xml parser , use Linq to xml instead, it is very simple and clean.I am sure that you can do all the above in three lines of code if you use Linq to XML , your code uses a lot of magic numbers (ex: 7-n ..), the thing that make it unstable and non generic

Sleiman Jneidi
  • 22,907
  • 14
  • 56
  • 77