1

We have a table which keeps the log of internet usage inside our company. this table is filled by a software bought by us and we cannot make any changes to its table. This table does not have a unique key or index (to make the data writing faster as its developers say)

I need to read the data in this table to create real time reports of internet usage by our users.

currently I'm reading data from this table in chunks of 1000 records. My problem is keeping the last record I have read from the table, so I can read the next 1000 records.

what is the best possible solution to this problem?

by the way, earlier records may get deleted by the software as needed if the database file size gets big.

Rahmatizadeh
  • 125
  • 2
  • 9
  • 1
    Create a table, possibly in another database, that meets all of your needs with an identity or something. Start copying all the records into that table, you could even schedule a task. Then work out of the new table structure that you can store/report on records over the longer term. – Vincent James Jan 24 '13 at 12:22
  • records are added to this table every second, so how do I know up to which record I have imported into the new table? – Rahmatizadeh Jan 24 '13 at 12:43
  • 1
    If the timedate stamp precision goes precise enough you can use that field as your import start/stop points. If not, you could possibly add a trigger to Table1 and have the trigger automatically create the records in Table2 (your friendly table) – Vincent James Jan 24 '13 at 12:47
  • I tried using the timedate stamp but under certain conditions I might loose more than 500 records. but adding a trigger might be a good solution. I haven't done that before so any good documents on that? thanks – Rahmatizadeh Jan 24 '13 at 12:56
  • Here is something on this site that has the basics. http://stackoverflow.com/questions/4820302/how-to-work-with-after-insert-trigger-in-sql-server-2008 – Vincent James Jan 24 '13 at 13:03
  • This table get's duplicates as well I imagine? – Zane Jan 24 '13 at 19:18

3 Answers3

0

Depending on your version of SQL Server, you can use row_number(). Once the row_number() is assigned, then you can page through the records:

select *
from
(
   select *, 
     row_number() over(order by id) rn
   from yourtable
) src
where rn between 1 and 1000

Then when you want to get the next set of records, you could change the values in the WHERE clause to:

where rn between 1001 and 2000

Based on your comment that the data gets deleted, I would do the following.

First, insert the data into a temptable:

select *,  row_number() over(order by id) rn
into #temp
from yourtable

Then you can select the data by row number in any block as needed.

select *
from #temp
where rn between 1 and 1000
Taryn
  • 242,637
  • 56
  • 362
  • 405
  • I cannot use row numbers because as I stated records may get deleted from the table as needed – Rahmatizadeh Jan 24 '13 at 12:39
  • If the rows might disappear, then I would apply the `row_number()` when inserting the data into a temp table. Then you could still use row number – Taryn Jan 24 '13 at 12:42
  • The problem is getting the data into a temp table. records are being added every second how do I know which records I have imported so far and which records need to be imported? – Rahmatizadeh Jan 24 '13 at 12:48
  • @Rahmatizadeh the only other suggestion that I would have would be to place a trigger on the main table that inserts a exact copy of the data into another table that you can use for reporting. – Taryn Jan 24 '13 at 13:10
  • @bonCodigo LOGTime field is not unique. It stores the time of the web request not the time the log was written to database. there could be multiple web requests at the same time, or a single web request could produce multiple connections at the same time. – Rahmatizadeh Jan 24 '13 at 15:13
  • @Rahmatizadeh If the `LogTime` is not unique then I would go back to my suggestion to use a trigger. Then when a new record is inserted in the main table, a copy would be inserted into your other table that could be used for reporting. – Taryn Jan 24 '13 at 19:17
0

This would also help;

declare @numRecords int = 1000 --Number of records needed per request
declare @requestCount int = 0 --Request number starting from 0 and increase 1 by 1 

select top (@numRecords) *
from
(
   select *, row_number() over(order by id) rn
   from yourtable
) T
where rn > @requestCount*@numRecords

EDIT: As per comments

CREATE PROCEDURE [dbo].[select_myrecords] 
   --Number of records needed per request
   declare @NumRecords int --(= 1000 )
   --Datetime of the LAST RECORD of previous result-set or null for first request
   declare @LastDateTime datetime = null
AS
BEGIN

    select top (@NumRecords) *
    from yourtable
    where LOGTime < isnull(@LastDateTime,getdate())
    order by LOGTime desc

END
Kaf
  • 33,101
  • 7
  • 58
  • 78
  • I cannot use row numbers because records can be deleted from the table as needed – Rahmatizadeh Jan 24 '13 at 12:41
  • the table has a LOGTime field which is a datetime field. I think the program uses this field to delete every record which is over one month old – Rahmatizadeh Jan 24 '13 at 12:54
  • You said *there is no unique field..*, *the table has a LOGTime field which is a datetime field.*... :$ – bonCodigo Jan 24 '13 at 13:10
  • Then you don't need row_number(). You can order records as `LOGTime desc` and then select the top 1000. That will give you latest 1000 records. Isn't that what you need? – Kaf Jan 24 '13 at 13:12
  • the LOGTime field is not unique. it stores the time of users web request not the time the log was written. – Rahmatizadeh Jan 24 '13 at 15:09
  • Isn't it writing at the time or request? – Kaf Jan 24 '13 at 15:33
  • @Kaf no it has a local queue so that if there is delay write to the database, no log is lost. – Rahmatizadeh Jan 24 '13 at 17:29
0

Without any index you cannot efficiently select the "last" records. The solution will not scale. You cannot use "real-time" and "repeated table scans of a big logging table" in the same sentence.

Actually, without any unique identification attribute for each row you cannot even determine what's new (proof: say, you had a table full of thousands of booleans. How would you determine which ones are new? They cannot be told apart! You cannot find out.). There must be something you can use, like a combination of DateTime, IP or so. Or, you can add an IDENTITY column which is likely to be transparent to the software you use.

Probably, the software you use will tolerate you creating an index on some ID or DateTime column as this is transparent to the software. It might create more load, so be sure to test it (my guess: you'll be fine).

usr
  • 168,620
  • 35
  • 240
  • 369
  • The software checks the table structure and refuses to log to the table if there is any change. Currently I'm keeping the last logtime read from the LOGTime field and in the next read I start from the next second. but that means skipping up to 700 or 800 records – Rahmatizadeh Jan 24 '13 at 15:19
  • @Rahmatizadeh without any indexes this always requires a full table scan. Are you OK with that? How big is the table? – usr Jan 24 '13 at 15:28
  • the table is very big as its storing the log of 3000 users – Rahmatizadeh Jan 24 '13 at 17:31
  • And what about my other question. Are you OK with table scans? How big is the table in GB? How long does a scan take? – usr Jan 24 '13 at 17:37