0

I have a set of tables which track access logs. The logs contain data about the user's access including user agent strings. Since we know that user agent strings are, for all intents and purposes, practically unlimited, these would need to be stored as a text/blob type. Given the high degree of duplication, I'd like to store these in a separate reference table and have my main access log table have an id linking to it. Something like this:

accesslogs table:
username|accesstime|ipaddr|useragentid

useragents table:
id|crc32|md5|useragent
(the hashes are for indexing and quicker searching)

Here's the catch, i am working inside a framework that doesn't give me access to create fancy things like foreign keys. In addition, this has to be portable across multiple DBMSs. I have the join logic worked out for doing SELECTS but I am having trouble figuring out how to insert properly. I want to do something like

INSERT INTO accesslogs (username, accesstime, ipaddr, useragentid)
VALUES
(
  :username,
  :accesstime,
  :ipaddr,
  (
    CASE WHEN
      (
        SELECT id
        FROM useragents
        WHERE
          useragents.crc32 = :useragentcrc32
          AND
          useragents.md5 = :useragentmd5
          AND useragents.useragent LIKE :useragent
      ) IS NOT NULL
    THEN
      THAT_SAME_SELECT_FROM_ABOVE()
    ELSE
      GET_INSERT_ID_FROM(INSERT INTO useragents (crc32, md5, useragent) VALUES (:useragentcrc32, :useragentmd5, :useragent))
  )
)

Is there any way to do this that doesn't use pseudofunctions whose names i just made up? The two parts i'm missing is how to get the select from above and how to get the new id from a subquery insert.

Community
  • 1
  • 1
Dan
  • 31
  • 4

2 Answers2

0

I'm not sure there is a cross platform way of doing this. You may have to have a lot of special cases for each supported back end. For Example, for SQL Server you'd use the merge statement as the basis of the solution. Other DBMSs have different names if they support it at all. Searching for "Upsert" might help.

Edt - added the second query to be explicit, and added parameters.

-- SQL Server Example
--Schema Defs
Create Table Test (
  id int not null identity primary key,
  UserAgent nvarchar(50)
)

Create Table WebLog (
  UserName nvarchar(50),
  APAddress nvarchar(50),
  UserAgentID int
)

Create Unique Index UQ_UserAgent On Test(UserAgent)

-- Values parsed from log
Declare 
  @UserName nvarchar(50) = N'Loz',
  @IPAddress nvarchar(50) = N'1.1.1.1',
  @UserAgent nvarchar(50) = 'Test'
Declare @id int

-- Optionally Begin Transaction
-- Insert if necessary and get id 
Merge 
  Into dbo.Test as t
Using
  (Select @UserAgent as UserAgent) as s
On
  t.[UserAgent] = s.[UserAgent]
When Matched Then
  Update Set @id = t.id
When Not Matched Then
  Insert (UserAgent) Values (s.UserAgent);

If @id Is Null Set @id = scope_identity()

Insert Into WebLog (UserName, IPAddress, UserAgentID) Values (@UserName, @IPAddress, @id)

-- Optionally Commit Transaction
Laurence
  • 10,896
  • 1
  • 25
  • 34
  • i think you missed the part about how it's part of an insert into the main table. There's no merge/upserting. If the value is in the user agent table, nothing there changes and i need that value. If it's not in there, then it needs to be inserted, and i need the value of the new id. – Dan Oct 25 '12 at 10:50
  • You'd have to do a separate insert for the main table, this just gives you the id to use. if you need them to be a single unit of work, use a transaction. – Laurence Oct 25 '12 at 10:58
  • "If the value is in the user agent table, nothing there changes and i need that value. If it's not in there, then it needs to be inserted, and i need the value of the new id" - That's exactly what this does in SQL server, it's almost the definition of upsert. – Laurence Oct 25 '12 at 11:05
0

You will need to do separate inserts to each of the tables. You can not do insert into both at the same time.

If you use MS SQL Server once you inserted you can get inserted id by SCOPE_IDENTITY(), and then use it in another table insert.

Farfarak
  • 1,497
  • 1
  • 8
  • 8
  • If i didn't have to be portable and was writing only for MS, i'd use INSERT with OUTPUT and it'd take care of the whole thing. – Dan Oct 25 '12 at 11:24