0

I would like to create a data structure to capture time series of production, sales, and inventory data. However, for all cases, we don't need to track all the data. The exact data to track (for example sales and inventory but not production) is specified at the time the series is constructed/initiated.

One approach could be the following.

struct ProductionDataEntry { ... };
struct SalesDataEntry { ... };
struct InventoryDataEntry { ... };
// Each of the above struct could be arbitrarily large

struct DataEntryV1 {
  ProductionDataEntry pde_;
  SalesDataEntry      sde_;
  InventoryDataEntry  ide_;
};

typedef std::chrono::system_clock::time_point TimePoint;

struct TimeSeriesEntry {
  TimePoint timepoint_;
  DataEntry entry_;
};

std::deque<TimeSeriesEntry> time_series;

The drawback of the above approach is the following. In the use case, where sales and inventory data is necessary, but not production, the data structure would still consume space for ProductionDataEntry.

I am looking for an approach where I can avoid such space wastage.

Two options come to mind:

  1. Create separate time series for each kind of data and populate only those time series which are necessary. However, this copies TimePoint data multiple times and spoils the locality of data by spreading the collected data over multiple data structures.

  2. Organize DataEntry as pointers to individual data entries, something like

    struct DataEntryV2 {
      ProductionDataEntry * pde_{nullptr};
      SalesDataEntry      * sde_{nullptr};
      InventoryDataEntry  * ide_{nullptr};
    };
    

    and construct only those data entry objects that are necessary. However, this fragments the memory and introduces additional overhead of allocation and deallocation which I would like to avoid if possible.

  3. Organize DataEntry with std::optional, something like

    struct DataEntryV3 {
      std::optional<ProductionDataEntry> pde_;
      std::optional<SalesDataEntry>      sde_;
      std::optional<InventoryDataEntry>  ide_;
    };
    

    I think this requires one extra word per entry type. And it would still consume the space of unnecessary data.

I would like to be aware, are there any other options in the design space?

(Note: DataEntry may need to be extended to include new kind of data, e.g. PreOrderData.)

Jarod42
  • 203,559
  • 14
  • 181
  • 302
Arun
  • 19,750
  • 10
  • 51
  • 60
  • imho too broad and not really clear. None of the options you mention allow you to easily include new kind of data. – 463035818_is_not_an_ai Apr 05 '17 at 17:55
  • How many of these things do you have? Normally, the space requirement of a metadata object is trivial in comparison to the size of the data (time series in this case) and a few unused bytes really aren't worth worrying about. – rici Apr 05 '17 at 18:30
  • @tobi303: The kinds of data are all written and available at compile time. Among them, what exactly to include is specified at startup time (think command line arguments). – Arun Apr 05 '17 at 21:54
  • @rici: As mentioned in the question, the size of `ProductionEntry` might be big. So, if not production data is not necessary, I do not want to carry the `ProductionEntry` field (i.e. `pde_`) in all the `DataEntry` objects in the `time_series`. – Arun Apr 05 '17 at 21:56

2 Answers2

0

Create separate time series for each kind of data and populate only those time series which are necessary. However, this copies TimePoint data multiple times and spoils the locality of data by spreading the collected data over multiple data structures.

It's possible to acomplish this through inheritance. For example:

struct DataEntryV1: public ProductionDataEntry, public SalesDataEntry {};

You would still need to define each data type that you would use, but it wouldn't "spoil the locality of the data". As a bonus, look at all the reuse of code you would gain via polymorphism.

I present this as a simply an option and you should read this link about multiple inheritance before you decide do this.

Community
  • 1
  • 1
Aumnayan
  • 671
  • 3
  • 12
  • not sure if I understood the question correctly, but if OP wants to choose on the fly what a `DataEntry` should contain, then inheritance isnt a solution – 463035818_is_not_an_ai Apr 05 '17 at 17:53
  • To setup the inheritance correctly, I would need to know which data is needed at the time I write the code, isn't it? I wanted to postpone the decision until the software startup time. – Arun Apr 05 '17 at 21:49
0

If you know at compile time, you may use template:

template <typename ... Ts>
struct TimeSeriesEntry {
    TimePoint timepoint_;
    std::tuple<Ts...> entries_;
};

And

// No Production
std::deque<TimeSeriesEntry<SalesDataEntry, InventoryDataEntry>> time_series;
Jarod42
  • 203,559
  • 14
  • 181
  • 302