1

I have these entries:

id  |  fooddescription
--------------------
1   |  'Mollusks, oyster, eastern (blue point), wild, raw'
2   |  'Mollusks, oyster, eastern (blue point), wild, boiled or steamed'
3   |  'Vegetable oil, olive'
4   |  'Vegetable oil, almond'
5   |  'Pumpkin, boiled, drained, with salt'
6   |  'Pumpkin leaves, boiled, drained, with salt'

I want to treat the first two entries as one because they only have different preparation method, and distinct the others. The words in the string are ordered from general to specific, and the last part (when it has many descriptions and ,) is usually the preparation method that doesn't need distinction.

Desired result:

id  |  fooddescription
--------------------
1   |  'Mollusks, oyster, eastern (blue point), wild, '
3   |  'Vegetable oil, olive'
4   |  'Vegetable oil, almond'
5   |  'Pumpkin, boiled, drained, '
6   |  'Pumpkin leaves, boiled, drained, '

First I thought I could trim the string to remove the part after the last comma. So according to this MySQL answer, I made a postgres script:

SELECT reverse(
            substring(reverse(fooddescription),
                      position(',' in reverse(fooddescription)))) as trimmed, count(*)
FROM food_name
GROUP BY trimmed HAVING COUNT(*)>0 

I'll get this result:

'Mollusks, oyster, eastern (blue point), wild,'
'Vegetable oil,'
'Pumpkin, boiled, drained,'
'Pumpkin leaves, boiled, drained,'

"Vegetable oil," is not desirable and I couldn't keep the id.

So my question is:

  1. How to judge the number of delimiters , and only trim the last part if there are multiple delimiters?
  2. Also, is it possible to keep one id for each group after GROUP BY?
Luobster
  • 27
  • 3
  • 11

2 Answers2

2

instead of position based substring, you could split the text to array & count the number of elements.

here's a complete example:

WITH food_name (fooddescription) AS (
VALUES
  ('Mollusks, oyster, eastern (blue point), wild, raw'),
  ('Mollusks, oyster, eastern (blue point), wild, boiled or steamed'),
  ('Vegetable oil, olive'),
  ('Vegetable oil, almond'),
  ('Pumpkin, boiled, drained, with salt'), 
  ('Pumpkin leaves, boiled, drained, with salt')
)
SELECT ARRAY_TO_STRING(trimmed.trimmed, ', ')
FROM food_name
, LATERAL (SELECT STRING_TO_ARRAY(fooddescription, ', ') parts) parts
, LATERAL (SELECT CASE WHEN array_length(parts, 1) <= 2 THEN parts ELSE parts[1:array_length(parts, 1)-1] END trimmed) trimmed

This returns the following resultset:

                trimmed
Mollusks, oyster, eastern (blue point), wild
Mollusks, oyster, eastern (blue point), wild
Vegetable oil, olive
Vegetable oil, almond
Pumpkin, boiled, drained
Pumpkin leaves, boiled, drained
Haleemur Ali
  • 26,718
  • 5
  • 61
  • 85
  • Very nice, thank you! How about my second question, is it possible to keep one primary key (id) for each group after `GROUP BY`? – Luobster Mar 25 '19 at 13:36
  • I'm not sure what you mean by your second question. What do you want to keep as your primary key? After the `groupby` the column[s] grouped by are guaranteed to be unique. – Haleemur Ali Mar 25 '19 at 13:39
  • Sorry I wasn't clear. I meant each row has another column (`id`) as their primary key. I want to keep one `id` to represent the group (e.g. one id for two entries of the Mollusks, oyster). It is not possible to directly `select` the `id` column when using `GROUP BY`. – Luobster Mar 25 '19 at 13:48
  • 1
    if you have numeric ids, you could use `min` or `max` to pick 1 id when you are grouping. – Haleemur Ali Mar 25 '19 at 15:08
0

Replace everything after and including the last comma using regexp_replace:

select regexp_replace(
  'Mollusks, oyster, eastern (blue point), wild, raw',
  ',[^,]*$', ''
);
select regexp_replace(
  'Mollusks, oyster, eastern (blue point), wild, boiled or steamed',
  ',[^,]*$', ''
);

Output for both:

+----------------------------------------------+
| regexp_replace                               |
|----------------------------------------------|
| Mollusks, oyster, eastern (blue point), wild |
+----------------------------------------------+
  • It's more elegant than the `reverse` hack but works the same. I want it to only trim the last part if there are multiple delimiters (`,`). – Luobster Mar 25 '19 at 13:20
  • What do you mean by multiple delimiters? Can you provide an example where you don't want to trim? –  Mar 25 '19 at 13:21
  • One delimiter, don't trim: "Vegetable oil, olive", "Vegetable oil, almond". Multiple delimiter, do trim: the Mullusks example. – Luobster Mar 25 '19 at 13:28