Consider that I have the following tables/entities:
Posts
id | user_id | title | content |
---|---|---|---|
1 | 1 | Article 1 | Lorem ipsum |
2 | 1 | Article 1 | Lorem ipsum |
3 | 1 | Article 2 | Lorem ipsum 2 |
4 | 2 | Article 3 | Lorem ipsum |
Users
id | name |
---|---|
1 | John Doe |
2 | Timothy Fisher |
Notice that there are two posts from the user with an ID of 1 that have the same title and content. There was an error at the application level that allowed a user to submit a post twice in the past, leading to "duplicate" records.
I'm looking to query all of the posts, but consolidate the posts for each user that have duplicate titles and content.
The ideal result set would look like this:
post_id | author_name | title | content |
---|---|---|---|
1 | John Doe | Article 1 | Lorem ipsum |
3 | John Doe | Article 2 | Lorem ipsum 2 |
4 | Timothy Fisher | Article 3 | Lorem ipsum |
SELECT
posts.id as post_id,
users.name as author_name
posts.title,
posts.content
FROM
posts
INNER JOIN
users
ON
posts.user_id = users.id;
Whether or not the query pulled post 1
or 2
for John Doe wouldn't matter. In the actual databases, I have timestamps so I'd likely just pull the latest one.
Is this possible with SQL?