1

I have an array that looks like this [[(Double,Double)]]. It's a multi-dimensional array of tuples.

This is data that I will never query on, as it doesn't need to be queried. It only makes sense if it's like that on the client side. I'm thinking of storing the entire thing as string and then parsing it back to multi array.

Would that be a good approach and would the parsing be very expensive considering I can have a max of 20 arrays with 4 max inner array each with a tuple of 2 Double?
How would I check to see which is a better approach and if storing it as multi-dimensional array in PostgreSQL is the better approach?
How would I store it?

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
j.doe
  • 305
  • 3
  • 16
  • "*would the parsing be very expensive*" - since you would do that outside of the postgres database, only a benchmark in your chosen language/framework/technology can tell. Also measure the serialisation overhead. – Bergi Nov 13 '19 at 01:06
  • I can't see any reason *not* to store it an actual array in postgres, which will hopefully choose the optimal memory layout for it. Except if you can take a huge advantage of compression after your custom serialisation. That said, 20*4*2 is a really small array, so unless you have thousands of rows with these arrays, you probably [don't need to worry anyway](https://en.wikipedia.org/wiki/Premature_optimisation). – Bergi Nov 13 '19 at 01:10
  • How would I store a multi array of tuples though. I can store multi array of Double but would I store (Double,Double) – j.doe Nov 13 '19 at 01:26
  • Just like Erwin showed below :-) Alternatively, as a threedimensional array where the third dimension has a size of 2. – Bergi Nov 13 '19 at 01:59
  • What would be a better approach. Memory wise and performance wise ? And can you give an example of how to store three dimensional with size of 2 in PostgreSQL as I only saw tutorials for two dimensional. Thanks – j.doe Nov 13 '19 at 02:07
  • Memory wise: probably the array, as you hardly can get a smaller string byte size - as I said, unless you can leverage compression and store it in a binary column. Performance wise: I don't know, you would have to benchmark, but I doubt there is a large difference. Three-dimensional array: `'{{{1.23,23.4},{3.45,45.6},…},…}'::double[]`. – Bergi Nov 13 '19 at 02:12
  • Yes but what would be better the dd type array as Erwin said or the three dimensional array – j.doe Nov 13 '19 at 02:19
  • 1
    I don't think that will make any difference. The composite type might be a bit more difficult to work with, but that depends mostly on your tooling and technology. – Bergi Nov 13 '19 at 02:23

1 Answers1

5

To store an array of composite type (with any nesting level), you need a registered base type to work with. You could have a table defining the row type, or just create the type explicitly:

CREATE TYPE dd AS (a float8, b float8);

Here are some ways to construct that 2-dimensional array of yours:

SELECT ARRAY [['(1.23,23.4)'::dd]]
     , (ARRAY [['(1.23,23.4)']])::dd[]
     , '{{"(1.23,23.4)"}}'::dd[]
     , ARRAY[ARRAY[dd '(1.23,23.4)']]
     , ARRAY(SELECT ARRAY (SELECT dd '(1.23,23.4)'));

Related:

Note that the Postgres array type dd[] can store values with any level of nesting. See:

Whether that's more efficient than just storing the string literal as text very much depends on details of your use case.

  • Arrays types occupy an overhead of 24 bytes plus the usual storage size of element values.
  • float8 (= double precision) occupies 8 bytes. The text string '1' occupies 2 bytes on disk and 4 bytes in RAM. text '123.45678' occupies 10 bytes on disk and 12 bytes in RAM.
  • Simple text will be read and written a bit faster than an array type of equal size.
  • Large text values are compressed (automatically), which can benefit storage size (especially with repetitive patterns) - but adds compression / decompression cost.

An actual Postgres array is cleaner in any case, as Postgres does not allow illegal strings to be stored.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • What do you mean by the last line ? "An actual Postgres is cleaner in any case, as Postgres does not allow illegal strings to be stored." My use case is that when this data is queried it need to be in that array on the client as fast as possible. I am using swift on the client side – j.doe Nov 13 '19 at 01:50
  • The last line means, nobody can sneak in illegal data in the database - by accident or with malicious intent. – Erwin Brandstetter Nov 13 '19 at 01:52
  • So what would u recommend for my use case using dd type arrays or using strings. I added my use case in the comment above and BTW thanks! – j.doe Nov 13 '19 at 01:53
  • I would test actual performance of both variants, and unless plain text is substantially faster, I would go with actual Postgres arrays. – Erwin Brandstetter Nov 13 '19 at 01:57
  • Can you give me any resources on how to test these performances – j.doe Nov 13 '19 at 01:58
  • 1
    @j.doe Implement them both, and put a timing on that place where it matters. Then run on real-world data. – Bergi Nov 13 '19 at 02:00
  • Bergi in the comments above suggested to use a three dimensional array . What do you think about that approach and how is that different form the dd type array. I'm just trying to go with the best approach here thanks – j.doe Nov 13 '19 at 02:22
  • 1
    @j.doe: The alternative approach with another array instead of the composite type at its core does not need the custom type `dd`, resulting in simpler syntax. Storage size and performance should be identical. Maybe a bit more expensive to translate to/from your original form. – Erwin Brandstetter Nov 13 '19 at 04:52