5

Here is the table structure:

+----+-----------------------------+-----------+
| id |       post_content          | edited_id |
+----+-----------------------------+-----------+
| 1  | content 1                   | NULL      |
| 2  | content 2                   | NULL      |
| 3  | content 1 (edited)          | 1         |
| 4  | content 3                   | NULL      |
| 5  | content 4                   | NULL      |
| 6  | content 4 (edited)          | 5         |
| 7  | content 1 (edited)          | 1         |
+----+-----------------------------+-----------+

Now I want to select the latest edited version of each post. So this is expected result:

+----+-----------------------------+-----------+
| id |       post_content          | edited_id |
+----+-----------------------------+-----------+
| 7  | content 1 (edited)          | 1         |
| 2  | content 2                   | NULL      |
| 4  | content 3                   | NULL      |
| 6  | content 4 (edited)          | 5         |
+----+-----------------------------+-----------+

How can I do that?

Martin AJ
  • 6,261
  • 8
  • 53
  • 111

5 Answers5

3

SQL Fiddle to see how it works.

First we're removing the " (edited)" string part (where needed) from post_content to preapre a column for group by, then calculating maximum id per our group and finally joining back to the same table to retrieve values for a particular id.

SELECT
  goo.id,
  qa.post_content,
  qa.edited_id
FROM (
  SELECT
    MAX(id) AS id,
    post_content
  FROM (
    SELECT
      id,
      CASE WHEN locate(' (edited)', post_content) <> 0
           THEN left(post_content, locate(' (edited)', post_content) - 1)
           ELSE post_content
           END AS post_content,
      edited_id
    FROM qa
    ) foo
  GROUP BY post_content
  ) goo
  INNER JOIN qa ON goo.id = qa.id

Output

+----+-----------------------------+-----------+
| id |       post_content          | edited_id |
+----+-----------------------------+-----------+
| 7  | content 1 (edited)          | 1         |
| 2  | content 2                   | NULL      |
| 4  | content 3                   | NULL      |
| 6  | content 4 (edited)          | 5         |
+----+-----------------------------+-----------+

CASE explained

-- if post_content contains " (edited)" then locate() will return value of it's position
CASE WHEN locate(' (edited)', post_content) <> 0
  -- we're removing the " (edited)" part by doing left()-1 because we want the string to finish before first character of " (edited)"
     THEN left(post_content, locate(' (edited)', post_content) - 1)
  -- if post_content doesn't contain " (edited)" then simply return post_content
     ELSE post_content
     END AS post_content
Kamil Gosciminski
  • 16,547
  • 8
  • 49
  • 72
  • Odd .. where `id = 7` the content is `content 1 (edited)` .. But in the result, where `id = 7`, the content is `content 1`. – Martin AJ Aug 27 '16 at 20:54
  • Good point! Fixed. Just needed to change `goo.post_content` to `qa.post_content`. This is because we truncated the " (edited)" part and thus we need to get the original value from the table the same way we do for `edited_id` – Kamil Gosciminski Aug 27 '16 at 20:55
  • will work but with poor performance (especially for large tables), since you are using string functions and many inner queries – Nir Levy Aug 27 '16 at 21:10
  • @NirLevy have you tested it on a "large table"? I don't think so. RDBMS are built for join operation and we're only doing a self-join once. Most time would be consumed by `locate()` function, but I would argue that the performance would be "poor." – Kamil Gosciminski Aug 27 '16 at 21:21
3

This should work:

select qa.*
from(
  select max(id) as max_post_id
  from qa
  group by coalesce(edited_id, id)
) maxids
join qa on qa.id = maxids.max_post_id

http://sqlfiddle.com/#!9/6018b2/1

If you want to order by edited_id:

select qa.*
from(
  select max(id) as max_post_id, coalesce(edited_id, id) as edited_id
  from qa
  group by coalesce(edited_id, id)
) maxids
join qa on qa.id = maxids.max_post_id
order by maxids.edited_id

http://sqlfiddle.com/#!9/6018b2/9

Paul Spiegel
  • 30,925
  • 5
  • 44
  • 53
3

Like @PaulSpiegel suggested in the comments, I think a small change in the table structure will make your life much easier.
I suggest you'll change your edited_id with content_id, and give it value on all records, including the original one.

Your data would look like this:

+----+-----------------------------+------------+
| id |       post_content          | content_id |
+----+-----------------------------+------------+
| 1  | content 1                   | 1          |
| 2  | content 2                   | 2          |
| 3  | content 1 (edited)          | 1          |
| 4  | content 3                   | 3          |
| 5  | content 4                   | 4          |
| 6  | content 4 (edited)          | 4          |
| 7  | content 1 (edited)          | 1          |
+----+-----------------------------+------------+

By that, you can use the content_id as an identifier of the content, which save you the need for dealing with null values

The query will look like:

SELECT A.* from myTable as A INNER JOIN (
  SELECT max(id) as id, content_id from myTable group by content_id
) as B ON A.id = B.id 
 AND A.content_id = B.content_id
Nir Levy
  • 12,750
  • 3
  • 21
  • 38
  • Sounds good .. upvote .. But you know, [this](http://stackoverflow.com/questions/39177757/how-can-i-select-the-last-edited-version-of-the-post) is my full-question. Do you think will your approach work on the table which is in that question? – Martin AJ Aug 27 '16 at 21:09
  • @MartinAJ - yeah, it should work for that as well, I guess you need to add the `type` column to the inner query as well to get both types (both to the selected columns, to be used in the join, and to the group by) – Nir Levy Aug 27 '16 at 21:16
  • In your table structure, I need to put the id of itself into `content_id` column when a row isn't edited. In other word, how can I insert a row and set `id` value also as `content` value in the same time? – Martin AJ Aug 29 '16 at 19:04
2

try this:

select * from qa
where id in (
    select max(id) from qa
        group by case when edited_id is not null then edited_id
        else id end
) 
order by case 
    when edited_id is null then id 
    else edited_id end;

and here's the fiddle:

http://sqlfiddle.com/#!9/6018b2/10/0

Explanation

The inner query is the one who does all the work, outside query is only for ordering the results.

Inner query says that the grouping should be done on the rows that contain edited_id, it should group by edited_id, if it's not edited, it should group by the id. And of course, we asked to return the max(id) when grouping.

Taha Paksu
  • 15,371
  • 2
  • 44
  • 78
1
select max(id), post_content, edited_id from post where edited_id is not null group by edited_id
union
select id, post_content, edited_id from post where edited_id is null and id not in (select edited_id from post where edited_id is not null)
Andrej
  • 7,474
  • 1
  • 19
  • 21