3

I am using SQL Server 2012

I have a long running extended event (runs for days to capture events) that saves to a .xel file.

I have a job that runs periodically to import the data into a staging table. I am only importing the XML event_data column from the file so I can parse out the XML fields I need and save to a table for reporting.

I know when the last time I ran the import was so I want to see if I can only select records from the file that were added since the import process last ran.

I have it working now but it imports ALL the records from the files into staging tables, parses out the fields I need (including timestamp), then only imports the records that have a timestamp since the job last ran.

My process only inserts new ones since the last time the job ran so this all works fine but it does a lot of work importing and parsing out the XML for ALL records in the file, including the ones I already imported the last times the job ran.

So I want to find a way to not import from the file at all if it was already imported, or at least not have to parse the XML for the records that were already imported (though I have to parse it now to get the timestamp to exclude the ones already processed).

Below is what I have, and as I said, it works, but is doing a lot of extra work if I can find a way to skip the ones I already imported.

I only included the steps for my process that I need the help on:

-- pull data from file path and insert into staging table
INSERT INTO #CaptureObjectUsageFileData (event_data)
SELECT cast(event_data as XML) as event_data
FROM sys.fn_xe_file_target_read_file(@FilePathNameToImport, null, null, null)


-- parse out the data needed (only columns using) and insert into temp table for parsed data
INSERT INTO #CaptureObjectUsageEventData (EventTime, EventObjectType, EventObjectName)
SELECT n.value('(@timestamp)[1]', 'datetime') AS [utc_timestamp],
n.value('(data[@name="object_type"]/text)[1]', 'varchar(500)') AS ObjectType,
n.value('(data[@name="object_name"]/value)[1]', 'varchar(500)') as ObjectName
from (
    SELECT event_data
    FROM #CaptureObjectUsageFileData (NOLOCK)
) ed
CROSS apply ed.event_data.nodes('event') as q(n)


-- select from temp table as another step for speed/conversion
--  converting the timestamp to smalldatetime so it doesnt get miliseconds so when we select distinct it wont have lots of dupes
INSERT INTO DBALocal.dbo.DBObjectUsageTracking(DatabaseID, ObjectType, ObjectName, ObjectUsageDateTime)
SELECT DISTINCT @DBID, EventObjectType, EventObjectName, CAST(EventTime AS SMALLDATETIME)
FROM #CaptureObjectUsageEventData
WHERE EventTime > @LastRunDateTime
TT.
  • 15,774
  • 6
  • 47
  • 88
Brad
  • 3,454
  • 3
  • 27
  • 50
  • I don't know if I get this correctly... There's no way to avoid running through the XML to check the DateTime-Values. So my suggestion was: Read the whole lot within a CTE into a derived table (or into a temp table) and use either `INSERT` with `WHERE NOT EXISTS()` or you might think about `MERGE`. But both approaches need the data to be a normal set... – Shnugo Jan 25 '19 at 18:54
  • @Shnugo That is what I thought for once I get the XML into my temp table, I was trying to find (if its possible) for an option on the step that imports the XML from the .xel file to have some way to exclude them at that step. I see some option in SQL 2017 I think, but wanted to find if there was a way in 2012 to do it. Or perhaps a better/more efficient way to do the XML parsing step (thats wher 99% of the work is taking place) – Brad Jan 25 '19 at 18:58
  • Brad, at the moment you are reading the file, you won't get any help. Not before SQL-Server can deal with this in native XML type. But you might use an `xslt` or some external application to create the reduced file *before* you get into SQL-Server... – Shnugo Jan 25 '19 at 19:06
  • @Shnugo I wasnt sure if there were any options on the file import (sys.fn_xe_file_target_read_file(@FilePathNameToImport, null, null, null)) I was missing. There is some added parameters for offset for files but I can not wrap my head around what/how they work or what exactly they are doing so wasnt sure if those would be options. – Brad Jan 25 '19 at 19:15
  • I don't know actually... I think this is just a simple counter. So, if you know the count of elements you've imported already, it might be enough, to set this as the `initial_offset` parameter... Sorry, no experience with this... – Shnugo Jan 25 '19 at 19:18
  • 1
    Brad, one more idea, you could use the offset parameter with some puffer to read not so many rows into your staging table and additionally the `.nodes()` predicate to reduce the parsed rows... – Shnugo Jan 25 '19 at 19:23
  • Yea the offset is what I am not sure how it works or what exactly it does and not sure if it would work with what/how I am doing things. Appreciate the help – Brad Jan 25 '19 at 19:37
  • I believe offset parameter may work, but no document tells how to retrieve/define the offset value. – jyao Oct 11 '19 at 18:31

3 Answers3

1

Okay, I've place a comment already, but - after thinking a bit deeper and looking into your code - this might be rather simple:

You can store the time of your last import and use a predicate in .nodes() (like you do this in .value() to get the correct <data>-element).

Try something like this:

DECLARE @LastImport DATETIME=GETDATE(); --put the last import's time here

and then

CROSS apply ed.event_data.nodes('event[@timestamp cast as xs:dateTime? > sql:variable("@LastImport")]') as q(n)

Doing so, .nodes() should return only <event>-elements, where the condition is fullfilled. If this does not help, please show some reduced example of the XML and what you want to get.

Shnugo
  • 66,100
  • 9
  • 53
  • 114
  • I already have a date/time last imported I have I am pulling from a table so that part is already set. This looks like it will help a lot if it eliminates the number of rows that will need to be parsed in the XML (the parsing of the 3 fields is what takes the longest in my script) so if this part runs BEFORE the selecting/parsing step of the individual fields, this will help (which I believe it does) I just have to test it out and let you know – Brad Jan 25 '19 at 19:14
  • @Brad Yeah, it can make a big difference if you read a full XML and filter **outside** with a `WHERE` or if you can place the predicate **within XQuery** to return just rows you are looking for... – Shnugo Jan 25 '19 at 19:16
  • I got an error had to make 1 update (maybe version of SQL): CROSS apply ed.event_data.nodes('event[@timestamp cast as xs:dateTime? > sql:variable("@LastRunDateTime")]') as q(n) I had to add ? after xs:dateTime for proper casting. – Brad Jan 25 '19 at 19:34
  • I am still in DEV and working with smaller datasets but that seems to work pretty well. I compared execution plans with my smaller data and the steps for the XML parsing went down 4-5% for overall plan usage for each one (there is tons of nested stuff in plan with the XML parsing). Its not a high number, but with more and more data will help out a LOT. Plus I can get rid of a couple of my later steps that I was inserting into a middle temp table for the parsing, then running the date check and inserting into my permanent table. With this I Can just insert directly into my permanent table! – Brad Jan 25 '19 at 19:37
  • This helped out a lot and you seem to get the XML stuff, maybe you can take a look at my other question (same project, different part): https://stackoverflow.com/questions/54369385/sql-query-xml-for-extended-events-system-table-for-session-object-not-data-file – Brad Jan 25 '19 at 19:42
  • Glad to help you! Sorry about the `xs:dateTime?`, the question mark is mandatory here... And yes, I'll look into your other question in a while... – Shnugo Jan 25 '19 at 19:52
1

Accepted answer above but posting the code for the section I had questions on in full with updates from comments/fixes I made (again not entire code) but important parts. Using @Shnugo help I was able to completely remove a temp table from my process that I needed for doing the date filtering on before inserting into my permanent table, with his answer I can just insert directly into the permanent table. In my testing small data sets the update and the removal of the extra code reduced the running time by 1/3. With the more data I get the bigger impact this improvement will give.

This is designed to run an Extended Event session over a long period of time. It will tell me what Objects are being used (to later query up against the system tables) to tell me what ones are NOT being used. See Extended Event generation code below: I am grabbing info on: sp_statement_starting and only grabbing SP and function events and only saving the object name, type, and timestamp I am NOT saving the SQL Text because it is not needed for my purpose.

The sp_statement_starting pulls every statement inside a Stored Procedure so when an SP runs it could have 1-100 statements starting events, and insert that many records into the file (which is way more data than needed for my purposes).

In my code after I import the file into the staging table I am shortning the timestamp to shortdatetime and selecting distinct values from all the records in the file

I am doing this because it inserts a record for every statement inside an SP, shortining the data to shortdatetime and selecting distinct greatly reduces the humber of recrods inserted.

I know I could just keep the object name and only insert unique values and ignore the time completely, but I want to see approximatly how often they are called.

CREATE EVENT SESSION [CaptureObjectUsage_SubmissionEngine] ON SERVER 
ADD EVENT sqlserver.sp_statement_starting(
    -- collect object name but NOT statement, thats not needed
    SET collect_object_name=(1),
    collect_statement=(0)
    WHERE (
    -- this is for functions or SP's
        (
            -- functions
            [object_type]=(8272) 
            -- SProcs
            OR [object_type]=(20038)
        ) 
        AND [sqlserver].[database_name]=N'DBNAMEHERE' 
        AND [sqlserver].[is_system]=(0))
    ) 
ADD TARGET package0.event_file( 
    SET filename=N'c:\Path\CaptureObjectUsage.xel'  -- mine that was default UI gave me
)
WITH (MAX_MEMORY=4096 KB,EVENT_RETENTION_MODE=ALLOW_SINGLE_EVENT_LOSS,MAX_DISPATCH_LATENCY=30 SECONDS,MAX_EVENT_SIZE=0 KB,MEMORY_PARTITION_MODE=NONE,TRACK_CAUSALITY=OFF,STARTUP_STATE=OFF)
GO




-- ***************************************************************************
--      code for importing
-- ***************************************************************************

-- pull data from file path and insert into staging table
INSERT INTO #CaptureObjectUsageFileData (event_data)
SELECT cast(event_data as XML) as event_data
FROM sys.fn_xe_file_target_read_file(@FilePathNameToImport, null, null, null)


-- with the XML.nodes parsing I can insert directly into my final table because it does the logic here
INSERT INTO DBALocal.dbo.DBObjectUsageTracking(DatabaseID, ObjectType, ObjectName, ObjectUsageDateTime)
SELECT DISTINCT @DBID, -- @DBID is variable I set above so I dont need to use DBNAME and take up a ton more space
n.value('(data[@name="object_type"]/text)[1]', 'varchar(500)') AS ObjectType,
n.value('(data[@name="object_name"]/value)[1]', 'varchar(500)') as ObjectName,
CAST(n.value('(@timestamp)[1]', 'datetime') AS SMALLDATETIME) AS [utc_timestamp]
from (
    SELECT event_data
    FROM #CaptureObjectUsageFileData (NOLOCK)
) ed
-- original  before adding the .node logic
--CROSS apply ed.event_data.nodes('event') as q(n)
-- updated to reduce amount of data to import
CROSS apply ed.event_data.nodes('event[@timestamp cast as xs:dateTime? > sql:variable("@LastRunDateTime")]') as q(n)
Brad
  • 3,454
  • 3
  • 27
  • 50
0

old question, but since no one offered a solution using the initial_offset parameter for sys.fn_xe_file_target_read_file, I'll drop some code about how I used it a few years ago. It's not a working solution I think, because I cut and pasted it from a larger code base, but it shows everything that is needed to get it working.

-- table to hold the config, i.e. the last file read and the offset.
IF OBJECT_ID('session_data_reader_config', 'U') IS NULL 
CREATE TABLE session_data_reader_config
(
    lock                  bit           PRIMARY KEY 
                                        DEFAULT 1 
                                        CHECK(lock=1) -- to allow only one record in the table
    , file_target_path      nvarchar(260)
    , last_file_read        nvarchar(260)
    , last_file_read_offset bigint
    , file_exists           AS dbo.fn_file_exists(last_file_read)
)


-- Insert the default value to start reading the log files, if no values are already present.
IF NOT EXISTS(SELECT 1 FROM  session_data_reader_config )
INSERT INTO session_data_reader_config (file_target_path,last_file_read,last_file_read_offset)
VALUES ('PathToYourFiles*.xel',NULL,NULL)


-- import the EE data into the staging table
IF EXISTS(SELECT 1 FROM [session_data_reader_config] WHERE file_exists = 1 )
BEGIN
    INSERT INTO [staging_table] ([file_name], [file_offset], [data])
    SELECT t2.file_name, t2.file_offset, t2.event_data --, CAST(t2.event_data as XML)
    FROM [session_data_reader_config]
    CROSS APPLY sys.fn_xe_file_target_read_file(file_target_path,NULL, last_file_read, last_file_read_offset) t2
END 
ELSE
BEGIN
    INSERT INTO [staging_table] ([file_name], [file_offset], [data])
    SELECT t2.file_name, t2.file_offset, t2.event_data
    FROM [session_data_reader_config]
    CROSS APPLY sys.fn_xe_file_target_read_file(file_target_path,NULL, NULL, NULL) t2
END


-- update the config table with the last file and offset
UPDATE [session_data_reader_config]
    SET [last_file_read]        = T.[file_name]
      , [last_file_read_offset] = T.[file_offset]
    FROM (
            SELECT TOP (1) 
                   [file_name]
                 , [file_offset]
              FROM [staging_table]
          ORDER BY [id] DESC 
    ) AS T ([file_name], [file_offset])