This can be done in a single SQL statement. (Unfortunately, this approach does not satisfy the requirement to use regular expressions. EDIT: see the followup below for an inelegant and inefficient approach that does make use of regular expressions.)
We can get a count of occurrences of character 'a' in column A
, and set column B
to the count, with a query like this:
UPDATE mytable t
SET t.B = ( CHAR_LENGTH(t.A) - CHAR_LENGTH(REPLACE(t.A,'a','')) )
Let's unpack that a little bit. It's an UPDATE
statement, there's no WHERE
clause, so we're going to access and potentially update every row in mytable
. (We're assigning a table alias t
. That's not required, but we'll qualify our later references to columns A
and B
using that alias, to make it more clear to someone reading the SQL statement that those are references to columns in the table.
On the next line, we've got a straightforward SET
clause, assigning a value to column B
.
It's the next expression, the value we're assigning to column B
, that we should unpack a little bit.
To get a count of the 'a'
characters, we can use a little trick: we know that an 'a'
character is exactly one character in length.
The "trick" is to use the REPLACE
function, to search for all occurrences of the character 'a'
, and remove them (replacing them with a zero length string). And we can then compare the lengths of (number of characters in) the two strings. The difference is the number of 'a'
characters in the original string.
As a demonstration of how those expressions work:
SELECT t.foo AS foo
, REPLACE(t.foo,'a','') AS foo_2
, CHAR_LENGTH(t.foo) AS len
, CHAR_LENGTH(REPLACE(t.foo,'a','')) AS len_2
, CHAR_LENGTH(t.foo) - CHAR_LENGTH(REPLACE(t.foo,'a','')) AS `len-len_2`
FROM ( SELECT 'a b a a' AS foo
UNION ALL SELECT 'b c a'
UNION ALL SELECT 'b c d'
UNION ALL SELECT 'a '
) t
The returned from that query:
foo foo_2 len len_2 len-len_2
------- ------ ------ ------ ---------
a b a a b 7 4 3
b c a b c 5 4 1
b c d b c d 5 5 0
a 3 2 1
Note: what is being returned is basically a count of the number of characters that were removed. So if we wanted to count occurrences of a string of multiple characters, for eample: cat
, we'd need to account for that.
Dividing the returned value by the number of characters in cat
would be one way to do that. Or, we could replace the string cat
with a string that is two characters in length, e.g. 'xx'
, so the difference in length would be one character per occurrence.
FOLLOWUP
The original question asked how to count the 'a' characters using regular expressions. My first thought was that it wasn't possible with MySQL REGEXP
because the return from that is either NULL, 0 or 1. But thinking about it a little bit, it could be done, if we are counting up to some finite number of occurrences. It would be possible to check if a string contains at least one 'a' character, that's pretty straightforward:
'a b a a' REGEXP 'a'
That will return 1 if it matches, and 0 if it doesn't. It's also possible to check if the string contains at least two 'a' characters. That's pretty straightforward too:
'a b a a' REGEXP 'a.*a'
If we add together the result from the two expressions above, we could get a count of either 0, 1 or 2 for the number of 'a' characters.
We can repeat that same pattern, to extend this to match to 3, 4, 5, etc. 'a' characters.
It's not elegant, and we certainly don't want to find out how warm the CPU is going to get doing all these comparisons. But it does return the specified result, up to some finite maximum count. In this example, six. (Strings containing more than six 'a' characters will return a count of 6.
As a demonstration:
SELECT t.foo
, (t.foo REGEXP CONCAT('.*',REPEAT('a.*',1)))
+ (t.foo REGEXP CONCAT('.*',REPEAT('a.*',2)))
+ (t.foo REGEXP CONCAT('.*',REPEAT('a.*',3)))
+ (t.foo REGEXP CONCAT('.*',REPEAT('a.*',4)))
+ (t.foo REGEXP CONCAT('.*',REPEAT('a.*',5)))
+ (t.foo REGEXP CONCAT('.*',REPEAT('a.*',6)))
AS cnt_a
FROM ( SELECT 'a b a a' AS foo
UNION ALL SELECT 'b c a'
UNION ALL SELECT 'b c d'
UNION ALL SELECT 'a '
) t