It's often useful to have a range or sequence table that gives you a source of large runs of contiguous sequential numbers, like this one covering the range -100,000–+100,000.
drop table dbo.range
go
create table dbo.range
(
id int not null primary key clustered ,
)
go
set nocount on
go
declare @i int = -100000
while ( @i <= +100000 )
begin
if ( @i > 0 and @i % 1000 = 0 ) print convert(varchar,@i) + ' rows'
insert dbo.range values ( @i )
set @i = @i + 1
end
go
set nocount off
go
Once you have such a table, you can do something like this:
select character = substring( t.some_column , r.id , 1 ) ,
frequency = count(*)
from dbo.some_table t
join dbo.range r on r.id between 1 and len( t.some_column )
group by substring( t.some_column , r.id , 1 )
order by 1
If you want to ensure case-insensitivity, just mix in the desired upper()
or lower()
:
select character = upper( substring( t.some_column , r.id , 1 ) ) ,
frequency = count(*)
from dbo.some_table t
join dbo.range r on r.id between 1 and len( t.some_column )
group by upper( substring( t.some_column , r.id , 1 ) )
order by 1
Given your sample data:
create table dbo.some_table
(
some_column varchar(50) not null
)
go
insert dbo.some_table values ( 'Elias' )
insert dbo.some_table values ( 'Sails' )
insert dbo.some_table values ( 'Pails' )
insert dbo.some_table values ( 'Plane' )
insert dbo.some_table values ( 'Games' )
go
The latter query above produces the following results:
character frequency
A 5
E 3
G 1
I 3
L 4
M 1
N 1
P 2
S 5