2

I am writing a website with sinatra and heroku, and I want to find a way to track every visit to my site. I have seen actual analytics programs (e.g. google analytics) and have chosen not to use them because I would like to learn how to do this myself.

My definition of a visit:

A visit happens when someone or something (robot) visits your site. It consists of one or more page views/ hits. One visitor can make multiple visits to your site.

Source: http://www.opentracker.net/article/hits-or-pageviews

For each visit, I would like to track:

  1. Visitor IP address
  2. Time visit began (page was opened)
  3. Time visit ended (page was closed)

This website is not viewed very often and so I would like to log each visit in a postgres database accessed with activerecord. The way that logging would work would be this:

  1. User accesses page
  2. Session is started, ip, mac_address, time, and view_id are logged in Visit
  3. Each page viewed is logged in PageView
  4. User closes page
  5. Session is cleared, time and view_id are logged in Visit

DATABASE FORMAT

  • Visits (Table)
    • ip (Column, string)
    • mac_address (Column, string)
    • view_id (Column, int)
    • time (Column, datetime)
  • PageViews (Table)
    • page (Column, string)
    • time (Column, datetime)
    • view_id (Column, int)

Sample Migration File:

class Main < ActiveRecord::Migration
  def change
    create_table :visits do |item|
        item.string :ip
        item.string :mac_address
        item.datetime :time
        item.int :visit_id
    end
    create_table :pageviews do |item|
        item.int :visit_id
        item.string :page
        item.datetime :time
  end
end
thesecretmaster
  • 1,950
  • 1
  • 27
  • 39

1 Answers1

1

For each visit, I would like to track:

  1. Visitor IP address
  2. Time visit began (page was opened)
  3. Time visit ended (page was closed)

You also had MAC addresses in the list before, but just to reiterate - they aren't used to route the internet, just local networks, so it's close to pointless to save that information, even if you could get at it.

HTTP is a stateless protocol, which means #3 is not possible via HTTP methods, but it can be done via javascript. Probably the easiest way is to poll at an acceptable interval, updating the time.

#1 and #2 are already caught by your basic server logs, they'd be what I'd use - why duplicate effort? - but I'll add how to use Sinatra to do it via the model.

If you use a before filter you can catch #1 and #2 easily. The Request object has some of what you want, and you'll need the time, and to make sure it's a unique user for that ip:

before do
  # this is pseudo code, Sequel style, you can work this bit out
  # for ActiveRecord
  user =
    if user_id = session[:user]
      User[user_id]
    else
      User.create
    end

  # you may want to check if there's an existing session for this page
  # as refreshes would run this again. It's up to you.
  user.add_visit Visit.create(page: request.path,ip: request.ip, start: Time.now.rfc2822])
  session[:analytics] = visit.id
  session[:user] = user.session_id # *don't* just bung the
                                   # user id in there
end

You'll need a route to log the end time to

patch "/analytics", :provides => :json do
  visit_id = session[:analytics]
  user = User[ :session_id => session[:user] ]
  visit = user.visits.find(:id => visit_id)
  visit.end = Rack::Utils.rfc2822(params[:end])
  visit.save
  halt 204 # take your pick of success numbers
           # you should also check for errors
           # and check the input is valid
           # and you may want to return some JSON to the
           # calling javascript.
  # Also think about how to restrict access to this
  # route to only authorised callers. Since you're providing the
  # javascript, you can place variables in them by generating
  # parts on the fly and serving it via a Sinatra route etc.
end

I'm not going to write the javascript, that should be straightforward.

Note, I basically pulled this code out of my backside so consider any or all of it likely to break and be shaky, but it's so you get the idea. Like I mentioned above, I'd probably cut most of this and use the logs and some judicious regex.

ian
  • 12,003
  • 9
  • 51
  • 107