1

I'm trying to do some reporting in SQL Server. Here's the basic table setup:

Order (ID, DateCreated, Status)

Product(ID, Name, Price)

Order_Product_Mapping(OrderID, ProductID, Quantity, Price, DateOrdered)

Here I want to create a report to group product with similar amount of sales over a time period like this:

Sales over 1 month:

  1. Coca, Pepsi, Tiger: $20000 average(coca:$21000, pepsi: $19000, tiger: $20000)
  2. Bread, Meat: $10000 avg (bread:$11000, meat: $9000)

Note that the text in () is just to clarify, not need in the report). User define the varying between sales that can consider similar. Example sales with varying lower than 5% are consider similar and should be group together. The time period is also user defined.

I can calculate total sale over a period but has no ideas on how to group them together by sales varying. I'm using SQL Server 2012. Any help is appreciated.

Sorry, my English is not very good :)

UPDATE: *I figured out about what I atually need ;)*

For an known array of numbers like: 1,2,3,50,52,100,102,105

I need to group them into groups which have at least 3 number and the difference between any two items in group is smaller than 10.

For the above array, output should be:

[1,2,3]

[100,102,105]

=> the algorithm take 3 params: the array, minimum items to form a group and maximum difference between 2 items.

How can I implement this in C#?

BinhPV
  • 35
  • 5
  • There's slight problem. Say you have sales 100, 104, 108, 112 an so on, up to 200 - all stay about 2%-4% apart from previous one. How would you group sales in such case? – Arvo Apr 09 '13 at 11:29
  • hi, could you check out my answer, and tell if it works? pretty sure it does~~ – Ethan Li Apr 11 '13 at 11:42

4 Answers4

1

I can't believe I did it~~~

-- this threshold is the key in this query
-- it means that 
-- if the difference between two values are less than the threshold
-- these two values are belong to one group
-- in your case, I think it is 200
DECLARE @th int
SET @th = 200

-- very simple, calculate total price for a time range
;WITH totals AS ( 
  SELECT p.name AS col, sum(o.price * op.quantity) AS val
  FROM order_product_mapping op
  JOIN [order] o ON o.id = op.orderid
  JOIN product p ON p.id = op.productid
  WHERE dateordered > '2013-03-01' AND dateordered < '2013-04-01'
  GROUP BY p.name
),
-- give a row number for each row
cte_rn AS ( -- 
  SELECT col, val, row_number()over(ORDER BY val DESC) rn
  FROM totals
),
-- show starts now,
-- firstly, we make each row knows the row before it 
cte_last_rn AS (
  SELECT col, val, CASE WHEN rn = 1 THEN 1 ELSE rn - 1 END lrn
  FROM cte_rn
),
-- then we join current to the row before it, and calculate 
-- the difference between the total price of current row and that of previous row
-- if the the difference is more than the threshold we make it '1', otherwise '0'
cte_range AS (
  SELECT
    c1.col, c1.val,
    CASE
      WHEN c2.val - c1.val <= @th THEN 0
      ELSE 1
    END AS range,
    rn
  FROM cte_last_rn c1
  JOIN cte_rn c2 ON lrn = rn
),
-- even tricker here,
-- now, we join last cte to itself, and for each row
-- sum all the values (0, 1 that calculated previously) of rows before current row
cte_rank AS (
  SELECT c1.col, c1.val, sum(c2.range) rank
  FROM cte_range c1
  JOIN cte_range c2 ON c1.rn >= c2.rn
  GROUP BY c1.col, c1.val
)
-- now we have properly grouped theres total prices, and we can group on it's rank 
SELECT 
  avg(c1.val) AVG,
  (
    SELECT c2.col + ', ' AS 'data()'
    FROM cte_rank c2
    WHERE c2.rank = c1.rank
    ORDER BY c2.val desc
    FOR xml path('')
  ) product,
  (
    SELECT cast(c2.val AS nvarchar(MAX)) + ', ' AS 'data()'
    FROM cte_rank c2
    WHERE c2.rank = c1.rank
    ORDER BY c2.desc
    FOR xml path('')
  ) price
FROM cte_rank c1
GROUP BY c1.rank
HAVING count(1) > 2

The result will look like:

AVG     PRODUCT     PRICE
28      A, B, C     30, 29, 27
12      D, E, F     15, 12, 10
3       G, H, I     4, 3, 2

for understanding how I did concatenate, please read this: Concatenate many rows into a single text string?

Community
  • 1
  • 1
Ethan Li
  • 1,001
  • 1
  • 13
  • 18
  • No, since what you've done is just checking difference between 2 values. All items in a group should not differ larger than the threshold. Example if the threshold is 3 then 1,2,3 is in a group but 1,2,5 is not since 5-1 = 4 >3. I'm still finding the solution for this. – BinhPV Apr 12 '13 at 08:58
  • for 1,2,5, maybe threshold should be 4? or maybe 1,2 should be in a group, and 5 should be in another group? is it just a matter of choosing appropriate threshold? and how would you determine what should be in a group? – Ethan Li Apr 12 '13 at 12:56
  • i just thought the algorithm here is to calculate the distance between two values that next to each other (as the values have been sorted), kkk, maybe I have got too excited about this.. – Ethan Li Apr 12 '13 at 13:00
  • True, haha, I didn't know 'FOR xml path('')' if not for this question. – Ethan Li Apr 12 '13 at 14:34
  • It is too hard for me to give up on SQL. I just saw your updated question. I think my SQL can generate the output you want. Just need to add 'having count(1) > 2' in the end of the query. For an array of 1,2,3,50,51,100,102,105, if you change the threshold to 3, it will give you two group, 1,2,3 and 100,102,105. Check my updated answer – Ethan Li Apr 12 '13 at 14:43
1

By the way, if you just want c#:

var maxDifference = 10;
var minItems = 3;     

// I just assume your list is not ordered, so order it first
var array = (new List<int> {3, 2, 50, 1, 51, 100, 105, 102}).OrderBy(a => a);

var result = new List<List<int>>();
var group = new List<int>();
var lastNum = array.First();
var totalDiff = 0;
foreach (var n in array)
{
    totalDiff += n - lastNum;

    // if distance of current number and first number in current group
    // is less than the threshold, add into current group
    if (totalDiff <= maxDifference)
    {
        group.Add(n); 
        lastNum = n;
        continue;
    }

    // if current group has 3 items or more, add to final result
    if (group.Count >= minItems)
        result.Add(group);

    // start new group
    group = new List<int>() { n };
    lastNum = n;
    totalDiff = 0;   
}

// forgot the last group...
if (group.Count >= minItems)
    Result.Add(group);

the key here is, the array need to be ordered, so that you do not need to jump around or store values to calculate distances

Ethan Li
  • 1,001
  • 1
  • 13
  • 18
  • There are some bug in this code. I'll post it once I finished fixing :D – BinhPV Apr 12 '13 at 15:24
  • My bad.. I don't have IDE at home, so couldn't try it, sorry for bugs. And I just find out I forget the last group. Have fixed it – Ethan Li Apr 12 '13 at 15:34
  • haha, genius developer :) I was busy helping my girl on her report. Thanks very much :) – BinhPV Apr 12 '13 at 16:20
  • This applies to threshold between 2 number. To have it checking for whole group, we should add a test between current number and first number in the group – BinhPV Apr 12 '13 at 16:27
  • Hey... I was an idiot... I miss read your question... You were saying the max distance between any two numbers in the group... No wonder the SQL doesn't work.. By the way, I have fixed my c# code. I will try SQL later. – Ethan Li Apr 12 '13 at 16:51
0

This query should produce what you expect, it displays products sales for every months for which you have orders :

SELECT CONVERT(CHAR(4), OP.DateOrdered, 100) + CONVERT(CHAR(4), OP.DateOrdered, 120) As Month , 
Product.Name , 
AVG( OP.Quantity * OP.Price ) As Turnover
FROM Order_Product_Mapping OP
INNER JOIN Product ON Product.ID = OP.ProductID
GROUP BY  CONVERT(CHAR(4), OP.DateOrdered, 100) + CONVERT(CHAR(4), OP.DateOrdered, 120) ,
          Product.Name

Not tested, but if you provide sample data I could work on it

jazzytomato
  • 6,994
  • 2
  • 31
  • 44
  • This is what I've done to calculate of each product over a period. What I actually want is group product with similar sales together. As in example coca, pepsi and tiger have similar sales. – BinhPV Apr 09 '13 at 11:03
  • One way that i'm thinking of is to break down the range of sales into smaller range. Example if coca is the best sell product with $1M sales, then we break it into smaller range : 0-100k, 100k-200k..., 900k-1M. Then we can find the mean of total sales in each range, then adding and subtracting the user defined percent to narrow the result. But I think there should be better ways. – BinhPV Apr 09 '13 at 11:09
  • I think it's some sort of clustering analysis? – BinhPV Apr 09 '13 at 11:17
0

Look like I made things more complicate than it should be. Here is what should solve the problem:

-Run a query to get sales for each product.

-Run K-mean or some similar algorithms.

BinhPV
  • 35
  • 5
  • it is doable, kk~~ agree that it was too complicate, but in many case, if you can to it in sql, it will be faster than do it in code – Ethan Li Apr 11 '13 at 11:57