Dynamic multi-dimensional array with key in C++

Question

I am trying to find a good way to handle below scenario in C++.

When we start service on server, a parameters table like below will be initialized based on data in database.

ID, filed_1, field_2, .... , field 50
100abc, ***, ***, ...., ***
120def, ***, ***, ...., ***
...
...
500xyz, ***, ***, ..., ***

Fields/Columns: around 50. Count and format of fields are fixed. All fields' types are int, double or char* (not very long char*).

Records/rows: at most 200. Based on the data, number of records will be different every time.

ID is unique.

During calculation, the parameter table will be read and updated for 500 times/per second. (search by id & field name, I assume)

Low latency is important in the system.

What will be best data structure to be used in such scenario?

In case there are methods which can greatly improve efficiency if there is no write(update) operation, please also kindly share the information. I think there are workarounds to do no update on the parameter table.

Thank you so much.

For fixed number of columns, I would use a (one dimensional) `std::vector<>`. If the number of columns varies a `std::vector` can be used instead. For each row: If certain columns have fixed type, I would use a `struct` to model the row data. The `struct` may contain a `std::vector<>` (or `std::array`) member for repeating columns of same type. If you don't know in general whether certain columns contain numbers (with or w/o decimal point) or text then I would store them all as `std::string`. `std::any` could be an alternative but I'm not sure it's worth. — Scheff's Cat, Jan 14 '20 at 06:44
Sorry, why the [tag:json] tag? As exposed, your input data looks like a CSV file: [SO: How can I read and parse CSV files in C++?](https://stackoverflow.com/questions/1120140/how-can-i-read-and-parse-csv-files-in-c) — Scheff's Cat, Jan 14 '20 at 06:48
Who is updating your table? Why should you store it on disk 500 times per second? Shouldn't be better using a memory mapped message passing solution? — Costantino Grana, Jan 14 '20 at 06:53

score 0 · Accepted Answer · answered Jan 14 '20 at 19:27

FYI, you're asking an opinionated question about algorithms and data structures, which is normally a better fit for this stack exchange site: https://softwareengineering.stackexchange.com/

Anyway, with all appropriate grains of salt, here's my uninformed opinion. Considering this:

Count and format of fields are fixed.

and this:

Low latency is important in the system.

Consider using a hash map with a perfect hashing function to look up your fields by name. Back in the day you'd use gperf as a build step to generate the hashing function in C, but with C++ constexpr magic you have this option:

https://github.com/Kronuz/constexpr-phf

The documentation there is just so, so unhelpful, so here's how you'd use it. Start by inputting your fields to make the hash function:

fnv1ah32 fnv1a{};
constexpr auto fields_phf = phf::make_phf({
    fnv1a("field1"), 
    fnv1a("field2"), 
    fnv1a("field3"), 
    fnv1a("field4")
    /* , ... */
});

I don't have any special insight for what to use for values, but since you want to store one of 3 types, I'll use std::variant for this example:

// ...assuming your field values will fit in std::string's short string optimization
using Value = std::variant<int, double, std::string>;

Then you can wrap an O(1) lookup table around a contiguous array of data:

struct Row {
    std::array<Value, FIELD_COUNT> fields;

    template <typename T>
    Value& operator [](T&& t) { 
        auto pos = fields_phf.find(fnv1a(t));
        if (pos == phf::npos) {
            throw std::runtime_error("unknown field");
        }
        return fields[pos];
    }
};

Then use a regular hash table to look up your rows, which is a pretty good default if you don't know the values beforehand. Reserve 200 rows to minimize rehashing, since you think that's your ceiling:

std::unordered_map<std::string, Row> table;
table.reserve(200);

Then you can do your lookups:

int main() {
table["row1"]["field1"] = 42;
table["row2"]["field2"] = "hello";

Demo: https://godbolt.org/z/z8euVY

I realized I have another data usage which I didn't know before. There will be much calculation based on data from same column. I did some search, and seems that if I use a map/hashmap etc, if I want to get data from one column, I have to iterate all rows to get the data. Is it correct understanding? Is there any other good suggestion to read all data from one column quickly? — highbury, Jan 16 '20 at 01:35
Taking a step back, 500 accesses a second is not very demanding. Map benchmarks are usually measured in 100s of thousands to millions of accesses per second, even on older systems. Are you sure a standard `unordered_map` with a `pair` key, or a map of maps wouldn't work well enough for your purposes (with less code to maintain)? — parktomatomi, Jan 20 '20 at 11:08

Dynamic multi-dimensional array with key in C++

1 Answers1