0

I have a COLUMN in a Postgres table which has values with the following pattern.

INPUT-X

INPUT-X-Y

INPUT-X-Y-Z

INPUT-X-Y-Z-A-...

I am trying to count the number of entries in COLUMN where the value begins with INPUT and contains a given number of hyphens. So if the input value is 2023-08-04T09:00:32.822+00:00, I want to count all of the entries in COLUMN which begin with 2023-08-04T09:00:32.822+00:00 and contain 3 hyphens. If values in COLUMN include:

2023-08-04T09:00:32.822+00:00-1 
2023-08-04T09:00:32.822+00:00-1-1 
2023-08-04T09:00:32.822+00:00-2 
2023-08-04T09:00:32.822+00:00-2-1 
2023-08-04T09:00:32.822+00:00-3
2023-08-04T09:00:32.822+00:00-4

The count of this operation should equal 4. (The purpose of this is to calculate the index for the next value of the series, which in this case would be 2023-08-04T09:00:32.822+00:00-5. Each of the values are a reference to a parent entry e.g. 2023-08-04T09:00:32.822+00:00-5 signifies the 5th reply to entry 2023-08-04T09:00:32.822+00:00. Likewise, 2023-08-04T09:00:32.822+00:00-2-3 signifies the 3rd reply to entry 2023-08-04T09:00:32.822+00:00-2.)

The code I'm currently using is:

select count(*) from messages_dev where ref like input || '%-' || repeat('-', hyphens) || '%' into result;

but that is returning 0, for counts that should have higher numbers.

Can anybody recommend a way to get this count in Postgres?

amatur
  • 317
  • 2
  • 11
  • maybe you should read up on https://stackoverflow.com/questions/3653462/is-storing-a-delimited-list-in-a-database-column-really-that-bad – nbk Aug 04 '23 at 15:45
  • I appreciate this recommendation and resource. The reason for the convention described above is to try and build an application with low energy/data usage, and for resilience contexts. It's oriented around caching and minimal database trips, and also for human readability (e.g. with a cached copy of all messages, by seeing the value "2023-08-04T09:00:32.822+00:00-2-1", you can see the thread by hand even.) Do any other database conventions come to mind, that seem aligned with this kind of convention specifically? – amatur Aug 04 '23 at 16:26

2 Answers2

1

This is a good question.

Your approach seems to be heading in the right direction, but it falls short due to a few errors in the query input || '%-' || repeat('-', hyphens) || '%'. Let us address the small pattern errors and then move onto the query in itself.

  1. There is an issue with the repeat('-', hyphens) pattern. This would mean --- like this when hyphens = 3. Note that we need to have integers after a - as well. Also, having the pattern '%- is not included in the count as well. Changing the pattern to input || repeat('-%') brings us 1 step closer to the solution.
  2. The query does not count the number of - in the input string, because you have input concatenated with a pattern of - repeated hyphen times. In case the input string is always 2023-08-04T09:00:32.822+00:00, changing it to repeat('-%', hyphens-2) would bring us closer to the answer but there is another problem here - % symbol represents any sequence of characters (including zero characters). It can match any substring of a string. So, the query '2023-08-04T09:00:32.822+00:00' || repeat('-%', 1) would match with all 6 values.

Solution:

  1. To mitigate all these issues, I would suggest using other string functions - namely the LENGTH and REPLACE Functions.
  2. We can find the length of the entire string (let us call this l1) and find the length of a new string - which has all its hyphens removed (l2). Substracting l2 from l1, would give us the number of - in the string.
  3. We can remove the hyphens in the string by replacing the - with a null character ''.

Overall, the query would look something like this:

SELECT COUNT(*) AS result
FROM messages_dev
WHERE ref LIKE  input || '-%' 
  AND LENGTH(ref) - LENGTH(REPLACE(ref, '-', '')) = hyphens; 

You could try this out here

  • 1
    Thank you so much for this help, the query and solution you provided is working successfully! One note: when running with "SELECT COUNT(\*) AS result", it gave the error `{ code: '42601', details: null, hint: 'If you want to discard the results of a SELECT, use PERFORM instead.', message: 'query has no destination for result data' }` I switched this to "SELECT COUNT(\*) INTO result" (after already declaring a `result` integer), and this fixed the problem. – amatur Aug 04 '23 at 17:20
0

As you have a string, you can use string functions, to get the next number.

but betst would be not to store it that way, as string functions are usually ver slow, and a normalized table structure suits better for relatinal databases

This query would find all sequences and get the higest number, if you want only specific ones, you need to add which dates you aczaully want.

when you want to use it in parallel systems, you should read about concrency.

CREATE TABLE mytable
    ("mycolumn" varchar(33))
;
    
INSERT INTO mytable
    ("mycolumn")
VALUES
    ('2023-08-04T09:00:32.822+00:00-1'),
    ('2023-08-04T09:00:32.822+00:00-1-1'),
    ('2023-08-04T09:00:32.822+00:00-2'),
    ('2023-08-04T09:00:32.822+00:00-2-1'),
    ('2023-08-04T09:00:32.822+00:00-3'),
    ('2023-08-04T09:00:32.822+00:00-4')
;

WITH CTE AS (select Substring(mycolumn,0,30) datetxt,MAX(split_part(mycolumn, '-', 4))::int +  1 newnum
FROM mytable
GROUP BY 1)
SELECT CONCAT(datetxt,'-', newnum::text)
  FROM CTE
concat
2023-08-04T09:00:32.822+00:00-5

fiddle

nbk
  • 45,398
  • 8
  • 30
  • 47
  • Thank you so much for your help. Can you just show how I would input a specified INPUT and HYPHENS variable into the function you shared? I have little experience with SQL. Also to clarify, the input will not always be a datetime format — sometimes the input could be `2023-08-04T09:00:32.822+00:00-1-1` for example. I think your code still works for this, but I wanted to mention just in case. – amatur Aug 04 '23 at 16:19
  • Also regarding the use of the string function, I shared a response with more context above. Would be glad to hear your thoughts on that as well. (The reason is to try and build an application with low energy/data usage, and for resilience contexts. It's oriented around caching and minimal database trips, and also for human readability (e.g. with a cached copy of all messages, by seeing the value "2023-08-04T09:00:32.822+00:00-2-1", you can see the thread by hand even.)) – amatur Aug 04 '23 at 16:28
  • what is input and what is hyphen, and what is ref, you need to provide for all data, and where they come from with stringsplit you can get all parts of the string that are - and you can access them in an array, or lie i used it, o gtab the first number – nbk Aug 04 '23 at 17:02