I am using pandas to analyse existing ssh sessions to different nodes, for that I have parsed the ssh daemon log and I have a DataFrame that contains the following columns:
- Node: the name of the node where the connection was established
- Session: the ID of the session
- Start: timestamp indicating when the connection started
- Finish: timestamp indicanting when the connection ended
Here's a part of the data:
In [375]: sessions[1:10]
Out[375]:
Node Session Start Finish
1 svg01 27321 2015-02-23 07:24:45 2015-02-23 07:50:57
2 svg02 14171 2015-02-23 10:25:08 2015-02-23 14:33:24
3 svg02 14273 2015-02-23 10:26:21 2015-02-23 14:36:19
4 svg01 14401 2015-02-23 10:28:16 2015-02-23 14:38:04
5 svg01 26408 2015-02-23 14:01:49 2015-02-23 18:38:25
6 svg03 13722 2015-02-23 18:24:39 2015-02-23 20:51:59
7 svg05 17637 2015-02-23 19:10:00 2015-02-23 19:10:20
I want to generate an additional column that has the number of established sessions in a given node at when a new connection is established.
Without taking into account the Node I can compute this using:
count_sessions = lambda t: sessions[(sessions.Start<t) & (sessions.Finish>t)].shape[0]
sessions['OpenSessions'] = sessions['Start'].map(count_sessions)
The problem is that I would also need to take into account the 'Node' column value but I do not know how to get it.
I could use the index of the element in the Series to get the node in the sessions DataFrame but I did not found any way to retrieve the index of the element passed to the map.