4

I have users' click events chronologically stored in my backend server. I also have session count, session time, etc. My goal is to identify the followings:

  1. Which feature do users use most? And which path do users use to reach that feature?
  2. Which path/feature users do not often traverse?
  3. In which path/feature do users drop?
  4. What are the bottlenecks in the application?
  5. Etc.

My idea: Considering each user event as a node. A User event can be any clickable event (button/clickable items) etc. Generate a tree with all possible events user can generate sequentially. Every node of the tree will contain the frequency of visiting that path up to that node. Then, from that tree, we can find out users' behavior on how they are interacting with our application. For example, Is there are any paths that users are mostly visiting, or on any path users can't advance, etc?

Is this a good approach? If not, how can my approach be improved? If you have any better approaches & good practices, suggest me please, and I will be grateful.

1 Answers1

-2

First of all, the tasks you have are solved casually by normal user behavioral analytics systems like Matomo, Umami, Adobe Analytics, Google Analytics and many others.

But they imply proper tracking of events and enriching of the information tracked. With only raw hits stored on the backend, you're basically reinventing a bicycle.

However, many companies do something similar to what you're doing for various technical reasons, and so there are solutions for them as well. Solutions for effective analysis and dashboarding. Business Intelligence tools. Things like Power BI, Tableau, Looker Studio, etc. They're made to more or less seamlessly connect to various data sources and make all data available for building reports via GUI. These tools already have all the typical aggregations and funnel reporting one may need, so there's no need to actually build your own tree.

Finally, if your data is in a poor enough state that you need to use programming data structures to answer even the simplest questions, then yes, a tree sounds like a solution. However, it's not proper. Normally vast majority if not all trivial analysis (which is what your goals require) is done via simple SQL or its abstractions. But for that, the data has to be enriched. For example, every event has to have the page name of where it happened. Every page should have a unique page name tracked with its page view. Every pageview event should have the previous page name. Maybe the next too. Things like client ids, session ids and such, of course. At least about 20 original core fields per event for basic analysis.

BNazaruk
  • 6,300
  • 3
  • 19
  • 33