0

I am trying to condense a table which contains multiple rows per event to a smaller table which contains counts of key sub-events within each event. Events are defined based on unique combinations across columns.

As a specific example, say I have the following data involving customer visits to various stores on different dates with different items purchased:

cust date   store   item_type
a    1     Main St  1
a    1     Main St  2
a    1     Main St  2
a    1     Main St  2
b    1     Main St  1
b    1     Main St  2
b    1     Main St  2
c    1     Main St  1
d    2     Elm St   1
d    2     Elm St   3
e    2     Main St  1
e    2     Main St  1
a    3     Main St  1
a    3     Main St  2

I would like to restructure the data to a table that contains a single line per customer visit on a given day, with appropriate counts. I am trying to understand how to use SQLite to condense this to:

Index   cust   date   store     n_items    item1    item2    item3     item4
1         a      1   Main St        4        1        3        0        0
2         b      1   Main St        3        1        2        0        0
3         c      1   Main St        1        1        0        0        0
4         d      2   Elm St         2        1        0        1        0
5         e      2   Main St        2        2        0        0        0
6         a      3   Main St        2        1        1        0        0

I can do this in excel for this trivial example (begin with sumproduct( cutomer * date) as suggested here, followed by cumulative sum on this column to generate Index, then countif and countifs to generate desired counts).

Excel is poorly suited to doing this for thousands of rows, so I am looking for a solution using SQLite.

Sadly, my SQLite kung-fu is weak.

I think this is the closest I have found, but I am having trouble understanding exactly how to adapt it.

When I tried a more basic approach to begin by generating a unique index:

CREATE UNIQUE INDEX ui ON t(cust, date);

I get:

Error: indexed columns are not unique

I would greatly appreciate any help with where to start. Many thanks in advance!

Community
  • 1
  • 1
pablo honee
  • 105
  • 1
  • 7

1 Answers1

0

To create one result record for each unique combination of column values, use GROUP BY.

The number of records in the group is available with COUNT.

To count specific item types, use a boolean expression like item_type=x, which returns 0 or 1, and sum this over all records in the group:

SELECT cust,
       date,
       store,
       COUNT(*) AS n_items,
       SUM(item_type = 1) AS item1,
       SUM(item_type = 2) AS item2,
       SUM(item_type = 3) AS item3,
       SUM(item_type = 4) AS item4
FROM t
GROUP BY cust,
         date,
         store
CL.
  • 173,858
  • 17
  • 217
  • 259
  • This is wonderful. I did not realize the ability to use multiple fields in the GROUP BY statement, but I think this has the potential to solve many of the related problems I am working on. THANK YOU! – pablo honee Apr 07 '14 at 12:28