0

Given the scenario that I would like to store sales of a store in a table in a database in Google Cloud Spanner. That table would (in addition to the sale ID as the primary key) have to store a list of product IDs sold, along with the quantity of each product sold. Ideally, this would be in the form of a column such as

productsSold ARRAY<product OBJECT <productID INT64, quantity INT64>>

I.e. each element in the array would be an object containing the product ID along with the quantity of the corresponding product sold. I saw the STRUCT datatype in Google Cloud Spanner, which would be ideal, but it seems you can't create columns using that.

Alternatively, it could just be stored in two separate arrays as

productIDs ARRAY
quantities ARRAY

But this has the possibility of the arrays being out of sync as each element in productIDs isn't directly linked to a corresponding element in quantities.

What would you recommend is the ideal way to store this data according to the best database conventions and through what is possible with Google Cloud Spanner? Thanks in advance!

Christiaan Louw
  • 107
  • 2
  • 11
  • [Is storing a delimited list in a database column really that bad?](https://stackoverflow.com/q/3653462/3404097) – philipxy Mar 26 '22 at 07:19

1 Answers1

1

First, I would normally design this to use two interleaved tables, so something like this:

CREATE TABLE Products (
  ProductId STRING(36) NOT NULL,
  Name STRING(200),
) PRIMARY KEY (ProductId);

CREATE TABLE Sales (
  SaleId STRING(36) NOT NULL,
  CustomerId STRING(36) NOT NULL,
) PRIMARY KEY (SaleId);

CREATE ProductSales (
  SaleId STRING(36) NOT NULL,
  ProductSaleId INT64 NOT NULL,
  ProductId STRING(36) NOT NULL,
  Quantity INT64,
) PRIMARY KEY (SaleId, ProductSaleId), INTERLEAVE IN PARENT (Sales);

If you for some reason must/want to store them in a single table using arrays, then you could use a CHECK CONSTRAINT to ensure the arrays have the same size:

CREATE TABLE Sales (
  SaleId STRING(36) NOT NULL,
  ProductIds ARRAY<STRING(36)>,
  Quantities ARRAY<INT64>,
  CONSTRAINT arrays_equal_length CHECK(ARRAY_LENGTH(ProductIds) = ARRAY_LENGTH(Quantities)),
) PRIMARY KEY ProductId;
Knut Olav Løite
  • 2,964
  • 7
  • 19
  • Thanks! I'll look into interleaved tables. Your suggestion seems to have a sale containing only a single productID though, where I need it to contain a list of productIDs. Or am I missing something? – Christiaan Louw Mar 25 '22 at 09:45
  • 1
    Yeah, you're right, the example was a little incomplete. I've added an extra table to clarify. The ProductSales table is the one that should be interleaved with the Sales table, and there is one record in the ProductSales table for each product that is sold. – Knut Olav Løite Mar 25 '22 at 13:52
  • Thanks that makes sense! Just a bonus question: Would it be effectively the same, or at least sufficient, to use the SaleId as a foreign key pointing to the sale, instead of interleaving it with the Sales table? – Christiaan Louw Mar 28 '22 at 07:42
  • It depends on how you will access the data. Interleaving ProductSales into Sales instructs Cloud Spanner to physically store all ProductSales records together with its parent Sales record. This makes joins between ProductSales and Sales a lot more efficient. Creating a foreign key constraint between ProductSales and Sales will also ensure that the relationship between the two is guaranteed, BUT it does not guarantee that the data will be stored physically close to each other. That will make joins between the two tables less efficient. – Knut Olav Løite Mar 28 '22 at 11:13