3

Consider that I have the following tables/entities:

Posts

id user_id title content
1 1 Article 1 Lorem ipsum
2 1 Article 1 Lorem ipsum
3 1 Article 2 Lorem ipsum 2
4 2 Article 3 Lorem ipsum

Users

id name
1 John Doe
2 Timothy Fisher

Notice that there are two posts from the user with an ID of 1 that have the same title and content. There was an error at the application level that allowed a user to submit a post twice in the past, leading to "duplicate" records.

I'm looking to query all of the posts, but consolidate the posts for each user that have duplicate titles and content.

The ideal result set would look like this:

post_id author_name title content
1 John Doe Article 1 Lorem ipsum
3 John Doe Article 2 Lorem ipsum 2
4 Timothy Fisher Article 3 Lorem ipsum
SELECT
  posts.id as post_id,
  users.name as author_name
  posts.title,
  posts.content
FROM
  posts
INNER JOIN
  users
ON
  posts.user_id = users.id;

Whether or not the query pulled post 1 or 2 for John Doe wouldn't matter. In the actual databases, I have timestamps so I'd likely just pull the latest one.

Is this possible with SQL?

Timothy Fisher
  • 1,001
  • 10
  • 27

2 Answers2

4

you could use a fake aggregation function and group by

SELECT 
  min(posts.id) as post_id,
  users.name as author_name
  posts.title,
  posts.content
FROM  posts
INNER JOIN  users  ON   posts.user_id = users.id
GROUP BY   users.name, posts.title, posts.content
ScaisEdge
  • 131,976
  • 10
  • 91
  • 107
  • I guess I didn't even consider group by. So wait, would this work without the aggregation function? I was able to simply add `group by posts.title, posts.content` and it gave me the correct result set. – Timothy Fisher Nov 09 '21 at 20:32
  • Oh, I see now the min/max is just to specify which ID to pull, awesome! Much easier than I thought it out to be. – Timothy Fisher Nov 09 '21 at 20:34
0

In MySQL you can also do it without aggregation. Yet I'm not sure if it's a good practice (more info here + see comment).

SELECT
  posts.id as post_id,
  users.name as author_name,
  posts.title,
  posts.content
FROM
  posts
INNER JOIN
  users
ON
  posts.user_id = users.id
GROUP BY
  users.name, posts.title, posts.content;
gman
  • 116
  • 1
  • 4
  • The post id will then be arbitrary, which is not a great behaviour of a database. Also, in MySQL 8 that functionality is deprecated and turned off by default. It's generally bad practice, usually unnecessary, and is a rotten code smell. – MatBailie Nov 09 '21 at 20:55