I want to save in DB the number of times each of out clients enter the website by week day and hour. Meaning that for each one of the clients I will have 24 * 7 values that will be constantly updating to reflect the peak hour with the most visits for the client. I've seen the obvious suggestion Database structure for holding statistics by day, week, month, year to create a new line for each entrance and than use the data, it won't work, we have millions of lines and I need the peak hour for each client to be available. Also, creating 168 columns for each clients looks a little extreme. Any suggestions?
-
1"Meaning that for each one of the clients I will have 24 * 7 values" - you also store 0? Even the most used apps will not be busy 24/7. – TomTom Sep 05 '20 at 19:00
-
"to create a new line for each entrance and than use the data, it won't work, we have millions of lines" - so what? Seriously? I have been in dwh projects storing 440 million rows PER DAY and keeping them for 10 years. millions of lines is something your phone can handle. – TomTom Sep 05 '20 at 19:01
2 Answers
A reality check may be in order.
I will have 24 * 7 values that will be constantly updating to reflect the peak hour with the most visits for the client.
This is assuming you store 24 entries per day, i.e. also entries the client does not show up. Unless your client is a company, people DO sleep.
to create a new line for each entrance and than use the data, it won't work, we have millions of lines
So? Millions of rows in a table was not a problem 30 years ago. It definitely is not a problem today. Yes, it may not run on a 20 year old desktop - but on a decent middle size server with half a tb of ram and a proper disc layout you can store hundreds of gigabytes of data and process them fast.
Also, creating 168 columns for each clients looks a little extreme.
It also is stupid. And i mean stupid. See - the problem is that while you (and your app) can possible work with it, you will soon find out that if you try to load the data i.e. into a reporting tool and find all rows for a specific hour - you are living in a world of pain. One row per entry is what the relational data model suggests. Anything else is either SMART (for a VERY limited use case) or demonstrating that tools follow the relational theorem VERY fast and you live in a world of REALLY painful code.
Not that it does not happen. I have seen people writing one new table for every invoice written (so the invoice details table is not getting too long)...
With one entry per entrance you can make a group by day of week, aggregate by hous - with 168 fields per row you can not easily.
Generally: This is 2020. A mid range desktop has 64gb RAM. A mid range server has a terabyte or two. SSD storage makes it easy and fast to for databases to work with hundreds of gb of data - something that was AWFULLY painfull in the time of hard discs. MILLIONS of rows where a joke in my first commercial level db project 26 years ago. Today, TENS OF BILLIONS are small change.

- 61,059
- 10
- 88
- 148
Here's a table structure (similar to one which I've seen implemented) which separates out the summary stats into week, date (or day), and hour tables with conjoining primary key to foreign key relationships. Instead of storing the different hours of the day as columns (which is not recommended in rdbms) it stores them in rows. Millions of visits per day (or hour) could be handled with appropriate indexes and partitioning as needed.
Something like this
DDL
create table dbo.visitor_events(
v_id int identity(1,1) primary key not null,
client_id int not null references clients(client_id),
visit_dt datetime2(7) not null default sysutcdatetime());
create table dbo.visitor_event_weeks(
vsw_id int identity(1,1) primary key not null,
client_id int not null references clients(client_id),
visit_wk int not null,
visits int not null);
create table dbo.visitor_event_dates(
vsd_id int identity(1,1) primary key not null,
client_id int not null references clients(client_id),
vsw_id int not null references visitor_event_weeks(vsw_id),
visit_wk int not null,
visit_dt datetime not null,
visits int not null);
create table dbo.visitor_event_hours(
vsh_id int identity(1,1) primary key not null,
client_id int not null references clients(client_id),
vsd_id int not null references visitor_event_dates(vsd_id),
visit_hr datetime not null,
visits int not null);
var
Variables and insert/update statement (would vary depending on what's best for the OP)
declare
@client_id int=123,
@visit_dt datetime2(7)=sysutcdatetime();
declare
@v_id int;
declare
@vsw table(vsw_id int unique not null);
declare
@vsd table(vse_id int unique not null);
/* Insert a visit */
insert dbo.visitor_events(client_id, visit_dt) values
(@client_id, @visit_dt);
select @v_id=scope_identity();
/* Insert/update a visit week */
update dbo.visitor_event_weeks
set visits=visits+1
output inserted.vsw_id into @vsw
where client_id=@client_id
and visit_wk=datediff(wk, 0, @visit_dt);
if @@rowcount>0
begin
insert dbo.visitor_event_weeks(client_id, visit_wk, visits)
output inserted.vsw_id into @vsw
values (@client_id, datediff(wk, 0, @visit_dt), 1);
end
/* Insert a visit date */
update dbo.visitor_event_dates
set visits=visits+1
output inserted.vsd_id into @vsd
where client_id=@client_id
and vsw_id=(select top 1 vsw_id from @vsw)
and visit_dt=cast(@visit_dt as date);
if @@rowcount>0
begin
insert dbo.visitor_event_dates(client_id, vsw_id, visit_dt, visits)
output inserted.vsd_id into @vsd
values (@client_id, (select top 1 vsw_id from @vsw), cast(@visit_dt as date), 1);
end
/* Insert a visit date hour */
update dbo.visitor_event_dates
set visits=visits+1
output inserted.vsd_id into @vsd
where client_id=@client_id
and visit_dt=cast(@visit_dt as date);
if @@rowcount>0
begin
insert dbo.visitor_event_hours(client_id, vsd_id, visit_dt, visits)
output inserted.vsd_id into @vsd
values (@client_id, (select top 1 vsw_id from @vsw), cast(@visit_dt as date), 1);
end

- 5,955
- 2
- 11
- 24