4

This is to process stock data; the data is in this format:

public class A
{
    public int Price;
    public int Available;
}

let's take this data for example:

var items = new List<A>
{
    new A { Price = 10, Available = 1000 },
    new A { Price = 15, Available = 500 },
    new A { Price = 20, Available = 2000 },
};

my query returns the average price for a specific volume, so for example:

  • if I have a requested volume of 100, my average price is 10

  • if I have a requested volume of 1200, I take the first 1000 at a price of 10, then the next 200 at a price of 15 etc

I have implemented that in C#, but I am trying to find if this could be done with LINQ directly with the database iterator.

I get data that is already sorted by price, but I don't see how to solve this without iteration.


Edit:

this is the code:

public static double PriceAtVolume(IEnumerable<A> Data, long Volume)
{
    var PriceSum = 0.0;
    var VolumeSum = 0L;

    foreach (var D in Data)
    {
        if (D.Volume < Volume)
        {
            PriceSum += D.Price * D.Volume;
            VolumeSum += D.Volume;
            Volume -= D.Volume;
        }
        else
        {
            PriceSum += D.Price * Volume;
            VolumeSum += Volume;
            Volume = 0;
        }

        if (Volume == 0) break;
    }

    return PriceSum / VolumeSum;
}

and the test code:

var a = new List<A>
{
    new A { Price = 10, Volume = 1000 },
    new A { Price = 15, Volume = 500 },
    new A { Price = 20, Volume = 2000 }
};

var P0 = PriceAtVolume(a, 100);
var P1 = PriceAtVolume(a, 1200);

Clarification:

Above I said I'd like to move it to LINQ to use the database iterator, so I'd like to avoid scanning the entire data and stop iterating when the answer is calculated. The data is already sorted by price in the database.

Uwe Keim
  • 39,551
  • 56
  • 175
  • 291
Thomas
  • 10,933
  • 14
  • 65
  • 136
  • 1
    Can you post your iterative code please? Maybe it is not possible without iteration. And don't forget that behind the scenes LINQ also uses iteration where needed ;) – Youp Bernoulli Jun 19 '19 at 17:01
  • 1
    your example data isn't clear, requested volume of 100 -> 10, doesn't match up with sample data `{10, 1000}` – Maslow Jun 19 '19 at 17:03
  • 1
    I updated the sample with something that compiles, but can you please post this: *"I have implemented that in C#"* – Rufus L Jun 19 '19 at 17:04
  • That's because 100 < 1000 and the price is 10. When requested volume is 1200 the price would be (1000*10 + 200*15) / 1200 I suppose. – Youp Bernoulli Jun 19 '19 at 17:05
  • @Maslow: yes, I need a volume of 100, it's covered by the 1000 available at the price of 10 – Thomas Jun 19 '19 at 17:05
  • @Thomas it sounds like you just need to pull records with a takewhile – johnny 5 Jun 19 '19 at 19:21
  • so it would be exactly like the C# code, just with takewhile instead of the foreach, right? I'm looking to see if linq can be pushed to do this kind of things :) – Thomas Jun 19 '19 at 19:24
  • @Thomas you say "yes, I need a volume of 100, it's covered by the 1000 available at the price of 10", but it's also covered by the `Available` volume 500 -> at price 15, do you take the lowest price? – mshwf Jun 19 '19 at 19:26
  • @mshwf it appears you give out the items at the best price avaialble – johnny 5 Jun 19 '19 at 19:27
  • @Thomas the only issue with take while is I'm not sure if the expression will be able to translate a closure because you will need to update the taken variable – johnny 5 Jun 19 '19 at 19:28
  • @Thomas please update you're question to reflect that you want this from the DB, it will be more clearer to user and again a lot more attention – johnny 5 Jun 19 '19 at 19:38
  • You can't execute such a query in DB using linq. If you're expecting large amount of data then you can get records as batches, process each batch as a collection of records and get another batch ..etc. until you get a result, you can use linq `TakeWhile` function on those batches to build the final result. – Ammar Jun 19 '19 at 20:01
  • @johnny 5, I updated the question a little while back with a clarification at the bottom – Thomas Jun 19 '19 at 20:06
  • @Ammar so the 2 options are full scan of the data if I want to use linq or takewhile if I want to just pull what’s needed, right? – Thomas Jun 19 '19 at 20:07
  • `TakeWhile` doesn't work on the database, so you would still have to pull all the data. The only solution involves calculating the running total on the DB side, which can't be done efficiently in LINQ. How much data might be in your database? With about 9,000 rows, any condition on the running field ends up taking 5.5 seconds whether you do it on the database or pull the data when computing the running field on the database server. – NetMage Jun 19 '19 at 21:18

4 Answers4

3

This is probably the most Linqy you can get. It uses the Aggregate method, and specifically the most complex of the three overloaded versions of Aggregate, that accepts three arguments. The first argument is the seed, and it is initialized with a zeroed ValueTuple<long, decimal>. The second argument is the accumulator function, with the logic to combine the seed and the current element into a new seed. The third argument takes the final accumulated values and projects them to the desirable average.

public static decimal PriceAtVolume(IEnumerable<A> data, long requestedVolume)
{
    return data.Aggregate(
        (Volume: 0L, Price: 0M), // Seed
        (sum, item) => // Accumulator function
        {
            if (sum.Volume == requestedVolume)
                return sum; // Goal reached, quick return

            if (item.Available < requestedVolume - sum.Volume)
                return // Consume all of it
                (
                    sum.Volume + item.Available,
                    sum.Price + item.Price * item.Available
                );

            return // Consume part of it (and we are done)
            (
                requestedVolume,
                sum.Price + item.Price * (requestedVolume - sum.Volume)
            );
        },
        sum => sum.Volume == 0M ? 0M : sum.Price / sum.Volume // Result selector
    );
}

Update: I changed the return type from double to decimal, because a decimal is the preferred type for currency values.

Btw in case that this function is called very often with the same data, and the list of data is huge, it could be optimized by storing the accumulated summaries in a List<(long, decimal)>, and applying BinarySearch to quickly find the desirable entry. It becomes complex though, and I don't expect that the prerequisites for the optimization will come up very often.

Theodor Zoulias
  • 34,835
  • 7
  • 69
  • 104
0

You could do something to generate the items' prices as a sequence. e.g.

public class A
{
    public int Price;
    public int Available;
    public IEnumerable<int> Inv => Enumerable.Repeat(Price, Available);
}

var avg1 = items.SelectMany(i => i.Inv).Take(100).Average(); // 10
var avg2 = items.SelectMany(i => i.Inv).Take(1200).Average(); // 10.8333333333333
maccettura
  • 10,514
  • 3
  • 28
  • 35
ericpat
  • 94
  • 5
  • 1
    that wouldn't work because it requires two passes over the data; I'm trying to use linq with the db iterator to only pull as much as required (we're talking about a large amount of data changing very quickly) – Thomas Jun 19 '19 at 17:26
  • @Thomas in your example its clear you're pulling from a list, if you're trying to use DB iterator you should have displayed that in your original question – johnny 5 Jun 19 '19 at 19:17
  • @johnny5: that's the goal, I would like to pull from the DB iterator in order to not pull the whole data. The C# code is just to illustrate the algorithm. – Thomas Jun 19 '19 at 19:19
  • @Thomas - This is still a single pass over the data. Why do you think it is two? – Enigmativity Jun 26 '19 at 11:49
  • After looking at it again, i think you were right: i was concerned that you needed a pass to make the enumerable and one to do the calculations but the result is computed with the enumerable so yes, you are right and it should be a single pass – Thomas Jun 26 '19 at 17:55
0

This is working as well (although not a one-liner):

private static decimal CalculateWeighedAverage(List<A> amountsAndPrices, int requestedVolume)
{
    int originalRequestedVolume = requestedVolume;

    return (decimal)amountsAndPrices.Sum(amountAndPrice =>
    {
        int partialResult = Math.Min(amountAndPrice.Available, requestedVolume) * amountAndPrice.Price;

        requestedVolume = Math.Max(requestedVolume - amountAndPrice.Available, 0);

        return partialResult;
    }) / originalRequestedVolume;
}

Take the sum of price * available as long as the requested volume is bigger than 0 and subtracting the amount of every item in the list in each "sum iteration". Finally divide by the original requested volume.

Youp Bernoulli
  • 5,303
  • 5
  • 39
  • 59
0

I think the best you can do with LINQ is minimize the running total computation done on the server and compute most of it on the client, but minimize the amount downloaded from the server.

I assume the items are already projected down to the two minimum columns (Price, Availability). If not, a Select can be added before pulling the data from the database into orderedItems.

// find price of last item needed; worst case there won't be one
var lastPriceItem = items.Select(i => new { i.Price, RT = items.Where(it => it.Price <= i.Price).Sum(it => it.Available) }).FirstOrDefault(irt => irt.RT > origReqVol);

// bring over items below that price
var orderedItems = items.OrderBy(i => i.Price).Where(i => i.Price <= lastPriceItem.Price).ToList();
// compute running total on client
var rtItems = orderedItems.Select(i => new {
    Item = i,
    RT = orderedItems.Where(i2 => i2.Price <= i.Price).Sum(i2 => i2.Available)
});

// computer average price
var reqVol = origReqVol;
var ans = rtItems.Select(irt => new { Price = irt.Item.Price, Quantity = Math.Min((reqVol -= irt.Item.Available)+irt.Item.Available, irt.Item.Available) })
                     .Sum(pq => pq.Price * pq.Quantity) / (double)origReqVol;
NetMage
  • 26,163
  • 3
  • 34
  • 55