20

I have the following query

SELECT Count(*) as Total_Count, Col1 
FROM Table1 
GROUP BY Col1 
ORDER BY Total_Count DESC;

I want to zoom in on Col1. The data in Col1 are in the following format:

text-abc1
txt4-abcde22
tex6-abc2
text4-imp4
text-efg1
txt-efg43

I want to be able to group it by

After the first `-`, any first three/four/five characters match

In this example, if we match with first 3 characters. Output will be:

Total_Count   Col1
3             abc
1             imp
2             efg

Any other way to achieve this?

theking963
  • 2,223
  • 7
  • 28
  • 42

5 Answers5

18

You might not need a regex, just string operations. For three characters:

SELECT count(*) AS Total_Count,
SUBSTRING(Col1 FROM POSITION('-' in Col1)+1 FOR 3) AS Col1_zoomed
FROM Table1
GROUP BY Col1_zoomed
ORDER BY Total_Count DESC
Ian Clelland
  • 43,011
  • 8
  • 86
  • 87
  • This is the right one. Thanks for it. But `abcd-1234` needs to be `Col1`. Please change it so I can accept it :). – theking963 Nov 25 '11 at 19:49
  • 1
    @daking963 As if this answer doesn't help you when one thing is spelled differently? Picky, picky! :-p – Wiseguy Nov 25 '11 at 19:53
  • 1
    @Wiseguy It's not about being picky. It's for the correctness of the answer and a future reference for others. I am sure they can figure it out that instead of the string it should be the column name. It's just a typo, Ian can change it easily. To improve the overall quality of the site, in my opinion, these little things matter a lot! – theking963 Nov 25 '11 at 19:58
  • Done :) Sorry, it was my sample data from shell-testing – Ian Clelland Nov 25 '11 at 20:22
4
select
substring(substring_index(col1,'-',-1),1,3) as grp,
count(*) as total
from table
group by grp
Nicola Cossu
  • 54,599
  • 15
  • 92
  • 98
1

I wanted to answer the question "MySQL GROUP by Regex?" as the answers here address the problem provided.

You can group by REGEXP with the REGEXP_SUBSTR() function.

REGEXP_SUBSTR(expr, pat[, pos[, occurrence[, match_type]]])

Returns the substring of the string expr that matches the regular expression specified by the pattern pat, NULL if there is no match. If expr or pat is NULL, the return value is NULL.

For example:

SELECT 
    *
FROM
    YOUR_TABLE
GROUP BY REGEXP_SUBSTR(YOUR_COLUMN, 'YOUR REGEXP');

Reference: https://dev.mysql.com/doc/refman/8.0/en/regexp.html#function_regexp-substr

Jon
  • 1,954
  • 1
  • 15
  • 13
1

This should do what you want.

SELECT Count(*) as Total_Count, SUBSTRING(Col1, 1, 3)
FROM Table1 
GROUP BY SUBSTRING(Col1, 1, 3) 
ORDER BY Total_Count DESC;
Adrian Cornish
  • 23,227
  • 13
  • 61
  • 77
  • `SUBSTRING(Col1, 1, 3)` won't work in this case as I need to match characters after the `-`. Before the `-` there could be any number of characters. – theking963 Nov 25 '11 at 19:45
  • I misread your question - I thought you wanted the first 3. Ian's answer is better then – Adrian Cornish Nov 25 '11 at 19:46
0
SELECT Count(*) as Total_Count, Col1, REGEXP_SUBSTR(Col1, '[a-z0-9]*', 0, 2) as Col1_combined 
FROM Table1 
GROUP BY REGEXP_SUBSTR(Col1, '[a-z0-9]*', 0, 2) 
ORDER BY Total_Count DESC;

REGEXP_SUBSTR returns the string matching the given regular expression [a-z0-9] starting from position 0 in text and return the 2 occurrence. Documentation for REGEXP_SUBSTR

REGEXP_SUBSTR(<column_name>, <regular_expression>, <starting_position>, <match_occurrence>)

Yashashvi
  • 340
  • 3
  • 12