I am trying to built a database based on a table with 5 categorical columns (Business, Product Type, Vendor, Region and Country) and as a measure sales.
The idea is to aggregate the sales amount by all permutation of the 5 categorical columns (Total of 120 different aggregations).
Table example:
Cat A | Cat B | Cat C | Sales |
---|---|---|---|
Orange | Spain | Mercadona | 100 |
Orange | UK | Tesco | 100 |
Strawberry | Ireland | Mercadona | 100 |
Orange | Germany | Mercadona | 100 |
Orange | Spain | Fnac | 100 |
Strawberry | Germany | Saturn | 100 |
Orange | Turkey | CorteIngles | 100 |
Blueberry | France | Fnac | 100 |
Feel free to use the table as bases I can then extrapolate to the database.
Overall my objective with the permutation is to then "quickly" identify where the biggest YoY variances come from and hopefully "automate" some comments/insights on the data based on a correct sequence of A,B,C or D to deep dive.
I have no language preference feel free to use R or Python. I do expect the size of the database greatly increase.
thanks for your time! kind regards, Kylian