I am currently have a project analysing web logs of a website using machine learning. I am cleaning the data and want to identify unique visitors to this site.
I don't have much experience in dealing with web logs, but it is obvious to see that when a user visits, several files were retrieved (for example the records in column cs.uri.stem
shown below).
My questions, how about when a user goes through several pages (like went to page B from a link in page A)? How can I know he's behaviours on this site?
Additionally, can anyone suggest any great python library that helps analysing web logs?
Much appreciated!!!
date time s.ip cs.method cs.uri.stem cs.uri.query s.port cs.username c.ip sc.status sc.substatus sc.win32.status time.taken device os browser
1 2014-08-05 00:00:03 10.130.0.12 GET / - 80 - 67.205.67.76 200 0 0 1391 Spider Other PingdomBot_1.4
2 2014-08-05 00:00:11 10.130.0.12 GET /about-the-hotel.aspx - 80 - 70.56.59.43 200 0 0 1194 PC Mac_OS_X_10.8 Firefox_31.0
3 2014-08-05 00:00:11 10.130.0.12 GET /~/media/Images/Hotel_ICON_revamp/about+us/a-hotel-unlike-any-others.ashx - 80 - 70.56.59.43 200 0 0 976 PC Mac_OS_X_10.8 Firefox_31.0
4 2014-08-05 00:00:12 10.130.0.12 GET /~/media/Images/Hotel_ICON_revamp/about+us/0713-ExComTeam.ashx - 80 - 70.56.59.43 200 0 0 1620 PC Mac_OS_X_10.8 Firefox_31.0
5 2014-08-05 00:00:12 10.130.0.12 GET /~/media/Images/Hotel_ICON_revamp/about+us/icon/vivienne-tam.ashx - 80 - 70.56.59.43 200 0 0 1713 PC Mac_OS_X_10.8 Firefox_31.0
6 2014-08-05 00:00:12 10.130.0.12 GET /~/media/Images/Hotel_ICON_revamp/about+us/icon/william-lim.ashx - 80 - 70.56.59.43 200 0 0 2387 PC Mac_OS_X_10.8 Firefox_31.0
7 2014-08-05 00:00:14 10.130.0.12 GET /~/media/Images/Hotel_ICON_revamp/about+us/icon/barney-cheng.ashx - 80 - 70.56.59.43 200 0 0 2180 PC Mac_OS_X_10.8 Firefox_31.0
8 2014-08-05 00:00:14 10.130.0.12 GET /~/media/Images/Hotel_ICON_revamp/about+us/icon/tommy-li.ashx - 80 - 70.56.59.43 200 0 0 1146 PC Mac_OS_X_10.8 Firefox_31.0
9 2014-08-05 00:00:14 10.130.0.12 GET /~/media/Images/Hotel_ICON_revamp/about+us/icon/yang-rutherford.ashx - 80 - 70.56.59.43 200 0 0 869 PC Mac_OS_X_10.8 Firefox_31.0
10 2014-08-05 00:00:14 10.130.0.12 GET /~/media/Images/Hotel_ICON_revamp/about+us/icon/justin_wong_img1.ashx - 80 - 70.56.59.43 200 0 0 845 PC Mac_OS_X_10.8 Firefox_31.0