Is it possible to consolidate records from a SELECT statement that have duplicate column values with another record in SQL?

Question

Consider that I have the following tables/entities:

Posts

id	user_id	title	content
1	1	Article 1	Lorem ipsum
2	1	Article 1	Lorem ipsum
3	1	Article 2	Lorem ipsum 2
4	2	Article 3	Lorem ipsum

Users

id	name
1	John Doe
2	Timothy Fisher

Notice that there are two posts from the user with an ID of 1 that have the same title and content. There was an error at the application level that allowed a user to submit a post twice in the past, leading to "duplicate" records.

I'm looking to query all of the posts, but consolidate the posts for each user that have duplicate titles and content.

The ideal result set would look like this:

post_id	author_name	title	content
1	John Doe	Article 1	Lorem ipsum
3	John Doe	Article 2	Lorem ipsum 2
4	Timothy Fisher	Article 3	Lorem ipsum

SELECT
  posts.id as post_id,
  users.name as author_name
  posts.title,
  posts.content
FROM
  posts
INNER JOIN
  users
ON
  posts.user_id = users.id;

Whether or not the query pulled post 1 or 2 for John Doe wouldn't matter. In the actual databases, I have timestamps so I'd likely just pull the latest one.

Is this possible with SQL?

score 4 · Accepted Answer · answered Nov 09 '21 at 20:16

4

you could use a fake aggregation function and group by

SELECT 
  min(posts.id) as post_id,
  users.name as author_name
  posts.title,
  posts.content
FROM  posts
INNER JOIN  users  ON   posts.user_id = users.id
GROUP BY   users.name, posts.title, posts.content

answered Nov 09 '21 at 20:16

ScaisEdge

131,976
10
91
107

I guess I didn't even consider group by. So wait, would this work without the aggregation function? I was able to simply add `group by posts.title, posts.content` and it gave me the correct result set. – Timothy Fisher Nov 09 '21 at 20:32
Oh, I see now the min/max is just to specify which ID to pull, awesome! Much easier than I thought it out to be. – Timothy Fisher Nov 09 '21 at 20:34

gman · Answer 2 · 2021-11-10T07:17:09.160

0

In MySQL you can also do it without aggregation. Yet I'm not sure if it's a good practice (more info here + see comment).

SELECT
  posts.id as post_id,
  users.name as author_name,
  posts.title,
  posts.content
FROM
  posts
INNER JOIN
  users
ON
  posts.user_id = users.id
GROUP BY
  users.name, posts.title, posts.content;

edited Nov 10 '21 at 07:17

answered Nov 09 '21 at 20:46

gman

116
1
4

The post id will then be arbitrary, which is not a great behaviour of a database. Also, in MySQL 8 that functionality is deprecated and turned off by default. It's generally bad practice, usually unnecessary, and is a rotten code smell. – MatBailie Nov 09 '21 at 20:55

Is it possible to consolidate records from a SELECT statement that have duplicate column values with another record in SQL?

2 Answers2