-1

I am currently have a project analysing web logs of a website using machine learning. I am cleaning the data and want to identify unique visitors to this site.

I don't have much experience in dealing with web logs, but it is obvious to see that when a user visits, several files were retrieved (for example the records in column cs.uri.stem shown below).

My questions, how about when a user goes through several pages (like went to page B from a link in page A)? How can I know he's behaviours on this site?

Additionally, can anyone suggest any great python library that helps analysing web logs?

Much appreciated!!!

         date     time        s.ip cs.method cs.uri.stem                                                               cs.uri.query s.port cs.username         c.ip sc.status sc.substatus sc.win32.status time.taken device            os          browser
1  2014-08-05 00:00:03 10.130.0.12       GET /                                                                                    -     80           - 67.205.67.76       200            0               0       1391 Spider         Other   PingdomBot_1.4
2  2014-08-05 00:00:11 10.130.0.12       GET /about-the-hotel.aspx                                                                -     80           -  70.56.59.43       200            0               0       1194     PC Mac_OS_X_10.8     Firefox_31.0
3  2014-08-05 00:00:11 10.130.0.12       GET /~/media/Images/Hotel_ICON_revamp/about+us/a-hotel-unlike-any-others.ashx            -     80           -  70.56.59.43       200            0               0        976     PC Mac_OS_X_10.8     Firefox_31.0
4  2014-08-05 00:00:12 10.130.0.12       GET /~/media/Images/Hotel_ICON_revamp/about+us/0713-ExComTeam.ashx                       -     80           -  70.56.59.43       200            0               0       1620     PC Mac_OS_X_10.8     Firefox_31.0
5  2014-08-05 00:00:12 10.130.0.12       GET /~/media/Images/Hotel_ICON_revamp/about+us/icon/vivienne-tam.ashx                    -     80           -  70.56.59.43       200            0               0       1713     PC Mac_OS_X_10.8     Firefox_31.0
6  2014-08-05 00:00:12 10.130.0.12       GET /~/media/Images/Hotel_ICON_revamp/about+us/icon/william-lim.ashx                     -     80           -  70.56.59.43       200            0               0       2387     PC Mac_OS_X_10.8     Firefox_31.0
7  2014-08-05 00:00:14 10.130.0.12       GET /~/media/Images/Hotel_ICON_revamp/about+us/icon/barney-cheng.ashx                    -     80           -  70.56.59.43       200            0               0       2180     PC Mac_OS_X_10.8     Firefox_31.0
8  2014-08-05 00:00:14 10.130.0.12       GET /~/media/Images/Hotel_ICON_revamp/about+us/icon/tommy-li.ashx                        -     80           -  70.56.59.43       200            0               0       1146     PC Mac_OS_X_10.8     Firefox_31.0
9  2014-08-05 00:00:14 10.130.0.12       GET /~/media/Images/Hotel_ICON_revamp/about+us/icon/yang-rutherford.ashx                 -     80           -  70.56.59.43       200            0               0        869     PC Mac_OS_X_10.8     Firefox_31.0
10 2014-08-05 00:00:14 10.130.0.12       GET /~/media/Images/Hotel_ICON_revamp/about+us/icon/justin_wong_img1.ashx                -     80           -  70.56.59.43       200            0               0        845     PC Mac_OS_X_10.8     Firefox_31.0
Adam Liu
  • 1,288
  • 13
  • 17
  • You can specify them with IP-OS-Browser – RaminNietzsche Apr 03 '17 at 04:51
  • Can you be more specific? – Adam Liu Apr 03 '17 at 04:52
  • 1
    Questions asking us to recommend or find a book, tool, software library, tutorial or other off-site resource are off-topic for Stack Overflow. – DYZ Apr 03 '17 at 04:52
  • @AdamLeo "how about when a user goes through several pages?" You can change your log and add referrer or check a user IP, If one IP visited a page with special Os and browser, maybe he went to the second page from a link in the first page – RaminNietzsche Apr 03 '17 at 04:59

1 Answers1

1

It may be a good idea to look at pandas library. Once you have loaded data using pandas (see example here), it should be straight forward to find unique elements conditioned on one or multiple columns, example here.

Community
  • 1
  • 1
ajmartin
  • 2,379
  • 2
  • 26
  • 42