Let us assume we have a website and, for the sake of argument, that the back end is written in Java.
Let us also assume that we would like to capture clickstream data for users on our website. We would like to track various things, for example:
- IP
- Access time
- Referral
- User Agent
- etc.
Another assumption is that we have a clickstream web-service somewhere with REST interface and that it simply saves the information we deliver to it to the database.
Now, from my, limited knowledge about this stuff, point of view, I see 2 issues.
- How to ensure that clickstream data is captured and not avoided by user?
- How to make your clickstream service portable?
At the moment, I'm seeing 2 ways of implementing clickstream, but both have some flaws.
- Use Javascript to send clickstream data, making it portable so you can hook it to any website without changing the backend code. The only changes that should be made are in HTML (and those are minor ones)
Have a HTML page with
<body onload="captureAndSendClickStreamData();">...</body>
where captureAndSendClickStreamData()
is a function in your Clickstream.js you included.
Obviously, this approach offers easy portability, right? However, what if the user disables JS? In essence, he's blocking the clickstream service you worked so hard on.
- You handle capturing clickstream data in some
ClickStreamServletFilter
class. The obvious advantage is that the end user has no knowledge of it, and can't really disable it. However, to extend some other site with your clickstream service, you need to mess up with the back end, which is even more messy if the website you're trying to enhance is not written, in this case, in Java.
So, my final questions are:
- Are there any other not-so-obvious (dis)advantages to the approaches mentioned?
- Are there any other viable approaches?
- How do big guns, like Google, Facebook, Amazon, handle this?
Thank you for your time :)