3

I want to make a simple CSV parser. It should go through a list of comma separated values and put them in a IList<int>. The values are expected to be integer numbers. In case a value is not parseable, I just want to omit it.

This is the code I have so far:

csv.Split(',').Select(item =>
{
    int parsed;
    if (int.TryParse(item, out parsed))
    {
        return parsed;
    }
    continue; //is not allowed here
}).ToList();

However, the use of continue is (of course) not allowed here. How to omit a value in my select implementation?

Note: Of course could I use a foreach or a LINQ expression, but I wonder how to do it with a lambda.

JiBéDoublevé
  • 4,124
  • 4
  • 36
  • 57
Marcel
  • 15,039
  • 20
  • 92
  • 150

8 Answers8

9

How about:

public static IEnumerable<int> ExtractInt32(this IEnumerable<string> values) {
    foreach(var s in values) {
        int i;
        if(int.TryParse(s, out i)) yield return i;
    }
}

then:

var vals = csv.Split(',').ExtractInt32().ToList();

The nice things here:

  • avoids magic "sentinal" numbers (like int.MinValue)
  • avoids a separate and disconnected "it is valid" / "parse" step (so no duplication)
Marc Gravell
  • 1,026,079
  • 266
  • 2,566
  • 2,900
  • Actually, on second thought, this is the best answer in my opinion. – Fabian Tamp Feb 26 '13 at 12:05
  • @Laurant the good thing about "magic" is that you can look behind the curtain and see how the trick is done. IIRC Jon's book "C# in Depth", chapter 6, covers this in exquisite detail. And that chapter may still be free (it was a demo chapter for a long while). – Marc Gravell Feb 26 '13 at 12:30
  • Although no lambda is used, a my first intention was, this is the most elegant solution. Thanks @MarcGravell – Marcel Feb 26 '13 at 12:41
  • 2
    @Marcel sometimes, if the question is "what is the best hammer to drive screws?", the most appropriate answer involves a screw-driver instead. – Marc Gravell Feb 26 '13 at 12:43
2

Select transforms a value. It doesn't filter. Where is doing that:

csv.Split(',')
   .Select(item =>
           {
               int parsed;
               return new { IsNumber = int.TryParse(item, out parsed), 
                            Value = parsed };
           })
   .Where(x => x.IsNumber)
   .Select(x => x.Value);

Additionally, please see this answer for a clever, short way of doing it. Please note that the meaning of "clever" isn't entirely positive here.

Community
  • 1
  • 1
Daniel Hilgarth
  • 171,043
  • 40
  • 335
  • 443
  • Your solution should work, however, the usage of a new anonymous type embarasses me. This seems to be much more expensive than the foreach alternative. – Marcel Feb 26 '13 at 12:11
  • 2
    @Marcel: Indeed. I would never use code like this. I would go with a loop here. But you wanted to do it in LINQ (see your last sentence). – Daniel Hilgarth Feb 26 '13 at 12:12
  • If one dislikes anonymous types, you can use a nullable type, as in `.Select(item => { int parsed; return int.TryParse(item, out parsed) ? parsed : (int?)null; })`. Probably not faster than the anonymous type, though. Your linked answer where you use "closure" in a shocking way, fascinates. – Jeppe Stig Nielsen Feb 26 '13 at 12:41
  • @JeppeStigNielsen: The important point of the other answer actually is the deferred nature of LINQ to objects. Good idea with the nullable type. – Daniel Hilgarth Feb 26 '13 at 12:57
1

I think you have three options:

  1. Use SelectMany instead which will allow you to return as empty enumerable for elements you wish to omit (and an enumerable of length 1 otherwise).
  2. Use an int value you are sure won't be in the set (e.g. -1) to represent 'omitted' and filter them out afterwards. This approach is fragile as you may pick a value that subsequently appears in the set which will result in a subtle bug. (You could mitigate this by using a larger data type, e.g. long and picking a value outside the range of int but then you will need to convert back to int subsequently.)
  3. Use Nullable<int> (int?) instead and filter out the null values afterwards.

1:

csv.Split(',').SelectMany(item =>
    {
        int parsed;
        if (int.TryParse(item, out parsed))
        {
            return new[] {parsed};
        }

        return Enumerable.Empty<int>();   
    }

3:

csv.Split(',').Select(item =>
    {
        int parsed;
        if (int.TryParse(item, out parsed))
        {
            return (int?) parsed;
        }

        return (int?) null;
     }
    .Where(item => item.HasValue)
    .Select(item => item.Value);
Paul Ruane
  • 37,459
  • 12
  • 63
  • 82
  • Good idea, but still makes this whole thing more complex than I think should be possible, either way. – Marcel Feb 26 '13 at 12:16
1

One way is to return some default value and then skip it.

errorInt = int.MinValue;
csv.Split(',').Select(item =>
{
    int parsed;
    if (int.TryParse(item, out parsed))
    {
        return parsed;
    }
    else
    {
        return errorInt;
    }

}).Where(val => val != errorInt).ToList();
Dmitrii Dovgopolyi
  • 6,231
  • 2
  • 27
  • 44
1

Why not to use Where on array and only then select proper ints

csv.Split(',')
    .Where(item => 
          { 
              int parsed; 
              return int.TryParse(item, out parsed); 
          })
    .Select(item => Convert.ToInt32(item));
Anatolii Gabuza
  • 6,184
  • 2
  • 36
  • 54
  • 1
    Er, wouldn't char working only on chars, not a string? My nubmers can have multiple digits. – Marcel Feb 26 '13 at 12:08
1

try this:

int dummy;
sv.Split(',').Where(c => int.TryParse(c,out dummy)).Select(c => int.Parse(c));

The int.TryParse(..) just checks if it is a valid string to be translated into an int. The out parameter is just ignored - we cont need it.

We know that only those string values that "makes-it" to the Select() are values that can be safetly parsed as int's.

Jens Kloster
  • 11,099
  • 5
  • 40
  • 54
  • A nice idea. But what tastes bad is calling the parse method twice, when one call can give me all information I need. – Marcel Feb 26 '13 at 12:18
  • 1
    I wouldn't mind :) there has to be a really huge amount of data for it to matter. If that is the case - perhaps a `Parallel.ForEach()` would be the answer – Jens Kloster Feb 26 '13 at 12:20
0

I would probably just use:

csv.Split(',').Where(item => isValid(item)).Select(item => TransformationExpression(item));

or,

csv.Split(',').Select(item => ReturnsDummyValueIfInvalid(item)).Where(item => item != DummyValue);
Fabian Tamp
  • 4,416
  • 2
  • 26
  • 42
0
int TempInt;
List<int> StuffIWant = csv.Split(',').Where(item => int.TryParse(item, TempInt)).ToList();
Rob Hardy
  • 1,821
  • 15
  • 15