1

I know we are all using a bunch of ORMs for SQL, but I wanted to give a try to native drivers. What is the correct way to map the data after executing Postgres Join? Please see the two queries below.

I have tables:

articles (id, title, text)

comments (id, article_id, text)

The data I want to pass back to the client is an array of articles, and each article has an array of comments:

[{  id: 1,
    title: "Article 1",
    text: "Random text",
    comments: [{
        id: 1,
        article_id: 1,
        text: "Hello",
      }],
  }];

I could achieve this using two different approaches.

SELECT a.id, a.title, c.id, c.text, c.article_id
FROM articles a JOIN "comments" c ON a.id = c.article_id

or

SELECT a.id, a.title, array_agg((c.id, c.text, c.article_id)) AS comments  
FROM articles a 
JOIN comments c ON a.id = c.article_id 
GROUP BY c.article_id
  1. First query: I will need to group data manually on the backend by c.article_id
  2. Second query: It will group it by PostgreSQL and create an array, but is this the correct approach?

I am specifically asking for 'every-day use', which means a bit greater tables. Or maybe you know some other approach? I would love to know if any of these approaches are popular ORMs using because with ORMs you can easily get the data as it is in the example of articles array above

danblack
  • 12,130
  • 2
  • 22
  • 41

1 Answers1

1

ORMs use the first approach for One-to-Many relations. One or several queries execute depending on fetch strategy. Grouping performs on the backend, out of the box by the framework.
For example, hibernate documentation.
This is a common solution because aggregation functions are implemented differently in databases. It is difficult to support a database-specific implementation. So this is an advantage of the first approach that it will work in the same way on any database.

Regarding the second approach, I see only one advantage the count of returned records will be significantly decreased and probably we will save time for communication with DB. The columns of the parent table will not be repeated. But we can't say that one of the approaches will be faster, it depends on query complexity and affected records count. Also, there can be issues connected to high memory consumption by array_agg function.
For example:
Out of memory
memory issue with array_agg
limit causes runaway memory

IMHO:
I would prefer to use the first approach as the main solution for the general case. The aggregation function is good where it can increase performance without high DB memory consumption.

Eugene
  • 5,269
  • 2
  • 14
  • 22