26

I have a csv string containing doubles (e.g "0.3,0.4,0.3"), and I want to be able to output a double array containing the cumulative sum of these numbers (e.g [0.3,0.7,1.0]).

So far, I have

double[] probabilities = textBox_f.Text.Split(new char[]{','}).Select(s => double.Parse(s)).ToArray();

which gives the numbers as an array, but not the cumulative sum of the numbers.

Is there any way to continue this expression to get what I want, or do I need to use iteration to create a new array from the array I already have?

simonalexander2005
  • 4,338
  • 4
  • 48
  • 92
  • 2
    I like to learn new technologies and ways of doing things. It's perfectly possible other ways are better or quicker, but this is something I don't know how to do, and so would like to – simonalexander2005 Jan 28 '11 at 00:26
  • And why? If a solution without LINQ is both quicker to type *and* faster to execute, why should you be interested in the LINQ solution? And why LINQ specifically, anyway — why not ask about a solution that uses generics, or `dynamic`, or any other random feature that it unnecessary to answer the question? – Timwi Jan 28 '11 at 00:34
  • `Split(new char[]{','})` may equivalently be written `Split(',')` since the parameter is declared with `params`. – Jeffrey L Whitledge Jan 28 '11 at 20:02
  • 3
    Old question, but I have to comment. The question is about how you do this in LINQ, not how you do it without LINQ. – Peter Hedberg Jun 12 '12 at 08:53
  • Make sure you really want `double` and not `decimal` (http://stackoverflow.com/questions/803225/when-should-i-use-double-instead-of-decimal) – JohnB Sep 23 '13 at 22:54
  • 2
    @simonalexander2005 I applaud you for asking the question but I must wonder why you selected the answer by Blindy that (currently) has -3 points? It's definitely not a good answer. Personally I liked Eric Lippert's or Andrey's answers better. Just wondering your take on this question almost 5 years later. – Kristopher Oct 23 '15 at 13:50

11 Answers11

58

There's a time for generality, and there's a time for solving the problem actually posed. This is one of the latter times. If you want to make a method that turns a sequence of doubles into a sequence of partial sums, then just do that:

public static IEnumerable<double> CumulativeSum(this IEnumerable<double> sequence)
{
    double sum = 0;
    foreach(var item in sequence)
    {
        sum += item;
        yield return sum;
    }        
}

Easy. No messing around with aggregates and complicated queries and whatnot. Easy to understand, easy to debug, easy to use:

textBox_f.Text
    .Split(new char[]{','})
    .Select(s => double.Parse(s))
    .CumulativeSum()
    .ToArray();

Now, I note that if that is user input then double.Parse can throw an exception; it might be a better idea to do something like:

public static double? MyParseDouble(this string s)
{
    double d;
    if (double.TryParse(s, out d))
        return d;
    return null;
}

public static IEnumerable<double?> CumulativeSum(this IEnumerable<double?> sequence)
{
    double? sum = 0;
    foreach(var item in sequence)
    {
        sum += item;
        yield return sum;
    }        
}
...
textBox_f.Text
    .Split(new char[]{','})
    .Select(s => s.MyParseDouble())
    .CumulativeSum()
    .ToArray();

and now you don't get an exception if the user makes a typing mistake; you get nulls.

Eric Lippert
  • 647,829
  • 179
  • 1,238
  • 2,067
26

I had a similar requirement some time ago. Basically, I needed to do an aggregation, but I also needed to select each intermediate value. So I wrote an extension method named SelectAggregate (probably not the most appropriate name, but I couldn't find anything better then) that can be used like that:

double[] numbers = new [] { 0.3, 0.4, 0.3 };
double[] cumulativeSums = numbers.SelectAggregate(0.0, (acc, x) => acc + x).ToArray();

Here's the code :

    public static IEnumerable<TAccumulate> SelectAggregate<TSource, TAccumulate>(
        this IEnumerable<TSource> source,
        TAccumulate seed,
        Func<TAccumulate, TSource, TAccumulate> func)
    {
        source.CheckArgumentNull("source");
        func.CheckArgumentNull("func");
        return source.SelectAggregateIterator(seed, func);
    }

    private static IEnumerable<TAccumulate> SelectAggregateIterator<TSource, TAccumulate>(
        this IEnumerable<TSource> source,
        TAccumulate seed,
        Func<TAccumulate, TSource, TAccumulate> func)
    {
        TAccumulate previous = seed;
        foreach (var item in source)
        {
            TAccumulate result = func(previous, item);
            previous = result;
            yield return result;
        }
    }
Thomas Levesque
  • 286,951
  • 70
  • 623
  • 758
6

You want to use the Aggregate operator, with a List<double> as the aggregation accumulator. That way you can produce a projection which is itself a sequence of sums.

Here's an example to get you started:

double[] runningTotal = textBox_f.Text
            .Split(new char[]{','})
            .Select(s => double.Parse(s))
            .Aggregate((IEnumerable<double>)new List<double>(), 
                       (a,i) => a.Concat(new[]{a.LastOrDefault() + i}))
            .ToArray();
LBushkin
  • 129,300
  • 32
  • 216
  • 265
  • 1
    this is inefficient, it will calculate Sum every time making it O(n^2) – Andrey Jan 28 '11 at 00:34
  • Great answer, and got me looking at aggregates - but in the end I chose another one to use. Thanks though – simonalexander2005 Jan 28 '11 at 00:36
  • @Andrey - My original implementation was incorrect, actually. I meant to use LastOrDefault() ... since the last item in the running total is already the cumulative sum of all the items before it. The only inneficiency remaining is the creation of intermediate sequences ... which for most use cases should be minimal. The implementation now compiles and produces the expected result. – LBushkin Jan 28 '11 at 00:46
3
var input=new double[]{ ... }
double sum=0;

var output=input
    .Select(w=>sum+=w);
Blindy
  • 65,249
  • 10
  • 91
  • 131
  • I modified this to: prob = textBox_prob.Text.Split(new char[]{','}).Select(s => double.Parse(s)).ToArray().Select(w=>sum+=w).ToArray(); which seemed more concise. Otherwise great, thanks :) – simonalexander2005 Jan 28 '11 at 00:34
  • 13
    Using LINQ queries just for their side-effects is a bad idea for a number of reasons - not least of which is the fact that future readers of the code won't expect this ... it violates the principle of least surprise. – LBushkin Jan 28 '11 at 00:35
  • @LBushkin while generally i agree this one is too simple. – Andrey Jan 28 '11 at 00:36
  • 2
    Wow, very short... and very hacky. Please don’t ask me to maintain your code! – Timwi Jan 28 '11 at 00:36
  • @simon, You have a useless `ToArray()` in the middle there. @LBushkin, such a buzz-kill :) – Blindy Jan 28 '11 at 00:37
  • 9
    This is a really, really bad idea. LBushkin is right. *Please do not write queries that have side effects*. Queries should *query* data, not *modify* it. – Eric Lippert Jan 28 '11 at 06:32
  • @Eric, I was actually wondering what you thought of it. Unfortunately Linq lacks a "map" operator unlike every other functional language. `Select` is the closest we have. – Blindy Jan 28 '11 at 13:00
  • 7
    @Blindy: Can you explain how Select differs from map? (And how is mutation of a variable as a side effect of a query comprehension "functional"?) – Eric Lippert Jan 28 '11 at 15:16
  • 3
    As a specific example of why this is such a bad idea, the fact that `var l1 = output.ToList(); var l2 = output.ToList();` results in lists with different contents is likely to be very surprising. – kvb Jan 28 '11 at 15:48
  • @Eric, it's more of a naming thing I think. When you see "map", you expect your sequence to change entirely. About your second point... I think if you asked a Haskell programmer to come up with a cummulative sum of an array, he'll write something similar to this, and everyone would find it normal. – Blindy Jan 28 '11 at 18:19
  • @kvb, fair enough, but you can set `output` to be the list itself instead of the enumerable, thus hiding it altogether. – Blindy Jan 28 '11 at 18:20
  • @Blindy - I think that would be a better solution. – kvb Jan 28 '11 at 18:33
  • @Blindy: When I see "Select" I expect the projected sequence to be entirely different from the original sequence. I'm not following your point; "Select" *is* "map", end of story. You can argue that the name is less intuitive, but the semantics should be the same. – Eric Lippert Jan 28 '11 at 18:37
  • 13
    @Blindy: and I would expect a Haskell programmer to be utterly horrified by what you've done. Haskell is all about avoiding side effects and mutations like this. – Eric Lippert Jan 28 '11 at 18:38
2

Why does it need to be LINQ?

var cumulative = new double[probabilities.Length];
for (int i = 0; i < probabilities.Length; i++)
    cumulative[i] = probabilities[i] + (i == 0 ? 0 : cumulative[i-1]);
Timwi
  • 65,159
  • 33
  • 165
  • 230
2

First of all i don't think that it is good task for Linq. Plain old foreach will do it better. But as a puzzle it is fine.

First idea was to use subqueries, but i don't like it, because it is O(n^2). Here is my linear solution:

        double[] probabilities = new double[] { 0.3, 0.4, 0.3};
        probabilities
            .Aggregate(
                new {sum=Enumerable.Empty<double>(), last = 0.0d},
                (a, c) => new {
                    sum = a.sum.Concat(Enumerable.Repeat(a.last+c,1)),
                    last = a.last + c
                },
                a => a.sum
            );
Andrey
  • 59,039
  • 12
  • 119
  • 163
  • 3
    I know it is seven years later, but I didn't see this solution until just now. Though this solution is linear in *time*, **it is also linear in its usage of stack space**, and that's really bad; a relatively small input array will cause the stack to overflow. Do you see *why* it is linear in stack space? Hint: the overflow is in the code that you *didn't* write. Write that code and see what happens! – Eric Lippert Sep 25 '18 at 17:46
  • @EricLippert that is an interesting remark. I am not sure what could cause linear stack space allocation. Does `Concat` effectively yields the sequence by storing it in the stack? https://github.com/Microsoft/referencesource/blob/master/System.Core/System/Linq/Enumerable.cs#L806 – Andrey Sep 25 '18 at 20:31
  • 1
    Here's a little program. Before you run it, try to predict what the output will be. Then try it; were you surprised? https://dotnetfiddle.net/9akiA5 It should now be clear where the stack is consumed. – Eric Lippert Sep 25 '18 at 20:51
  • @EricLippert yes, seems my guess was correct, `ConcatIterator` uses the stack to traverse the list. Thank you for pointing to it, I had no idea that it is unwise to use `Concat` for long lists. – Andrey Sep 26 '18 at 13:18
  • 1
    Just to clarify your comment: the problem is when you have a large number of concatenations. Concatenating two sequences together is fine. The problem arises when you concatenate one item onto a sequence, and then concatenate one item onto that sequence, and then concatenate one item onto that sequence, and then concatenate one item onto that sequence, … and you see how that goes. We build a chain of objects on the heap which *when iterated over* each triggers a method call, and the *call* goes on the stack. – Eric Lippert Sep 26 '18 at 13:41
  • The net result is: if the sequence is big, you overflow the stack. But even if it is small, think about the cost! Iterating the first item in an n-item list costs you 1 method call. Iterating the second costs you 2, iterating the third costs you 3, and so the total cost is *quadratic*. So it *looks* like you have a linear algorithm here, but in fact it is quadratic when you consider the code you didn't write: the `foreach`. – Eric Lippert Sep 26 '18 at 13:43
1

use RX :

var input=new double[]{ ... }
var output = new List<double>();
input.ToObservable().Scan((e, f) => f + e).Subscribe(output.Add);
1

This is actually pretty straightforward to generalize using generator. Here is a new extension method called Accumulate that works like a combination of Select and Aggregate. It returns a new sequence by applying a binary function to each element in the sequence and accumulated value so far.

 public static class EnumerableHelpers 
 {
    public static IEnumerable<U> Accumulate<T, U>(this IEnumerable<T> self, U init, Func<U, T, U> f) 
    {
        foreach (var x in self)
            yield return init = f(init, x);
    }

    public static IEnumerable<T> Accumulate<T>(this IEnumerable<T> self, Func<T, T, T> f)
    {
        return self.Accumulate(default(T), f);
    }

    public static IEnumerable<double> PartialSums(this IEnumerable<double> self)
    {
        return self.Accumulate((x, y) => x + y);
    }

    public static IEnumerable<int> PartialSums(this IEnumerable<int> self)
    {
        return self.Accumulate((x, y) => x + y);
    }
 }
cdiggins
  • 17,602
  • 7
  • 105
  • 102
1

Here's my solution:

  • Linq
  • linear time
  • linear memory
  • no side effects

Only caveat is that it doesn't work for empty lists (trivial to handle).

    var doublesSummed  = doubles.Skip(1).Aggregate(
        new {
            sum = doubles.First(),
            doubles = new [] {doubles.First()}.AsEnumerable()
        },  
        (acc, nextDouble) => new {
            sum = acc.sum + nextDouble,
            doubles = acc.doubles.Append(acc.sum + nextDouble)
        }
    );

Demo

GaloisGirl
  • 1,476
  • 1
  • 8
  • 14
0

Here's a way of doing it using LINQ:

double[] doubles = { 1.7, 2.3, 1.9, 4.1, 2.9 };
var doublesSummed = new List<double>();

Enumerable.Aggregate(doubles, (runningSum, nextFactor) => {
    double currentSum = runningSum + nextFactor;
    doublesSummed.Add(currentSum);
    return currentSum;
});

doublesSummed.Dump();

In LINQPad:

  • 4
  • 5.9
  • 10
  • 12.9
kamranicus
  • 4,207
  • 2
  • 39
  • 57
-1

Cumulative sum for List<double>:

var nums = new List<double>() { 0.3, 0.0, 0.4, 1.1 };
var cumsum = nums.Aggregate(new List<double> (), 
              (list, next) => { list.Add(list.LastOrDefault() + next); return list; });
Alamakanambra
  • 5,845
  • 3
  • 36
  • 43