3

I'm having some trouble with a query I want to write.

I have a table consists of files and their size in bytes. It looks like this:

FileUrl | FileSize
------------------
xyz.docx | 2794496
qwe.ppt | 655360
asd.pdf | 1388782
...
...

What I want is finding the number of files, % of total file count and % of total file size based on different size groups that I will define. So it should like this:

Size Category | Number of Files | % of Total File Count | ½ of Total File Size
------------------------------------------------------------------------------
0-1 MB        | 235             | 80%                   | 20%
1-10 MB       | 57              | 20%                   | 80%
10-50 MB
...
...

What is the best way to create such groups and then finding those percentages? I can't come up with a solution and my online searches didn't help at all.

Thank you in advance

  • Possible duplicate of [In SQL, how can you "group by" in ranges?](http://stackoverflow.com/questions/232387/in-sql-how-can-you-group-by-in-ranges) – DavidG Mar 08 '17 at 13:55

7 Answers7

2

Use case, and a CTE

with CTE as
(
select case 
           when filesize < 1024 then '0 - 1 MB'
           when ...
           else '50MB+'
       end as FileGroup,
       Filedata.*
from Filedata
)
select FileGroup, 
       count(Filename) as NumberOfFiles, 
       count(filename) / (select count(*) from CTE) as PCTotalCount, 
       sum(Filesize) / (select sum(filesize) from CTE) as PCTotalSize
from CTE
group by FileGroup
JohnHC
  • 10,935
  • 1
  • 24
  • 40
2

Here is one method using apply and window functions:

select v.sizecategory, count(*) as numfiles,
       (count(*) / sum(1.0 * count(*)) over () as ratio_files,
       (sum(filesize) / sum(sum(filesize) * 1.0)) over () as ratio_sizes
from t outer apply
     (values (case when t.filesize < 1000000 then '0-1 MByte'
                   when t.filesize < 10000000 then '1-10 MByte'
                   when t.filesize < 50000000 then '10-50 MByte'
                   . . .
              end)
     ) v(sizecategory) 
group by v.sizecategory
order by min(t.filesize);
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
1

Alternatively you can go with this:

SELECT
CASE 
WHEN FileSize < 1024 THEN '0-1 MB'
WHEN FileSize  FileSize < 10240 THEN '1-10 MB'
WHEN FileSize  FileSize < 51200 THEN '10-50 MB'
-- continue with this...
END 'SizeCategory',
count (fileUrl) as 'Number of Files',
(count (fileUrl)) / (sum(count (fileUrl)) over (order by null)) as '% of Total File Count',
(sum(FileSize)) / (sum(FileSize) over (order by null)) as '½ of Total File Size'
FROM table
GROUP BY SizeCategory
Georgi Raychev
  • 1,288
  • 1
  • 13
  • 26
  • 2
    When using case, it evaluates in order. As a result, you don;t need the `FileSize >= 1024`, only the `Filesize < 10240` (repeat for each case). Your final option is the `Else`, which would be the max range – JohnHC Mar 08 '17 at 14:22
  • That makes total sense. Thanks for the comment :) – Georgi Raychev Mar 08 '17 at 14:23
1

What you are asking for is a histogram. This might help:

SELECT FLOOR((filesize*1000000)/10.00)*10 As SizeCategory, 
       COUNT(*) AS [NumFiles]
FROM TableName
GROUP BY FLOOR((filesize*1000000)/10.00)*10 
ORDER BY 1

This will give you equal sized intervals.

Juan Carlos Oropeza
  • 47,252
  • 12
  • 78
  • 118
Karthik
  • 106
  • 3
1

Create your groups table with a range as follows:

RangeLow | RangeHigh | Number of Files | PercentCount | ½ of Total 
------------------------------------------------------------------------------
    0    |  1024000  | 235             | 80%                   | 20%
1024000  |  2048000  | 57              | 20%                   | 80%

Then your Query can look like this:

Select FileUrl, FileSize, (Select PercentCount  From GroupTable Where FileSize >= RangeLow AND FileSize < RangeHigh )  From FilesTable 
jfatal
  • 245
  • 1
  • 3
1

If you can't adjust the dataset SQL, you can instead use expressions in your SSRS report to categorise the file sizes. Use an expression like the following in both your row header cell, and the Group on expression of your row group:

=Switch(Fields!FILESIZE.Value < 1024, "0-1 MB",
Fields!FILESIZE.Value < 10240, "1-10 MB",
Fields!FILESIZE.Value < 51200, "10-50 MB",
True, "50+ MB")

You can then calculate the totals within the report too. Use the second parameter of the Count() and Sum functions to define the scopes:

=Count(Fields!FILESIZE.Value)
=Count(Fields!FILESIZE.Value, "RowGroup") / Count(Fields!FILESIZE.Value, "DataSet1")
=Sum(Fields!FILESIZE.Value, "RowGroup") / Sum(Fields!FILESIZE.Value, "DataSet1")
Pete Rennard-Cumming
  • 1,588
  • 10
  • 19
1

Many perfectly valid answers here. However, I prefer to maintain a generic Tier Table which can serve multiple masters, and removes the logic from code and offers a little more flexibility down-the-road.

Example

Declare @Tier table (Tier_Grp varchar(50),Tier_Seq int,Tier_Dsc varchar(100),Tier_R1 float,Tier_R2 float)
Insert Into @Tier values
('File Size',1  ,'0-1 MB'   ,0    ,1e+6 ),
('File Size',2  ,'1-10 MB'  ,1e+6 ,1e+7 ),
('File Size',3  ,'10-50 MB' ,1e+7 ,5e+7 ),
('File Size',4  ,'50-100 MB',5e+7 ,1e+8 ),
('File Size',5  ,'100 MB +' ,1e+8 ,9e+11 ),
('File Size',999,'Total'    ,0    ,9e+12 )  --< Optional

Declare @YourTable table (FileUrl varchar(50),FileSize int)
Insert Into @YourTable values
('xyz.docx',2794496),
('qwe.ppt',655360),
('asd.pdf',1388782)

;with cte as (
    Select A.* 
          ,Cnt   = count(FileSize)+0.0
          ,Total = isnull(sum(cast(FileSize as float)),0)
     From  @Tier A
     Left Join  @YourTable B on Tier_Grp='File Size' and B.FileSize >=Tier_R1 and B.FileSize<Tier_R2
     Group By Tier_Grp
             ,Tier_Seq
             ,Tier_Dsc
             ,Tier_R1
             ,Tier_R2
)
Select Tier_Grp
      ,Tier_Dsc
      ,[Number of Files]  = cast(Cnt as int)
      ,[Number of Bytes]  = Total 
      ,[Percent of Files] = format(Cnt/max(Cnt) over (),'0.0%')
      ,[Percent of Size]  = format(Total/max(Total) over (),'0.0%')
 From cte
 Order by Tier_Seq

Returns

enter image description here

John Cappelletti
  • 79,615
  • 7
  • 44
  • 66