I am currently trying to analyze network data with pandas. I have been reading other posts and the closest to my problem is Pandas - Find and index rows that match row sequence pattern.
I am trying to check if some of the packages are lost and count the amount of lost packages. Therefore, I would like to define a window or matrix, here 2x2. Then define a pattern, in this case it would be .
Now I want to check if the window is exactly a recurring window. If possible this should be done in an extra column giving me false or true (or nan). I tried this already in the following examples of my code.
In the first example i tried to check it with iterating the rows. My third example is more what I was looking for: With the rolling command I define a window and a pattern the code should check the rows for, but I get an error, because the pattern is a string. This is what I would like it to look like.
import pandas as pd
df = pd.read_csv('hallo')
Here I am filtering out the interferences
Protocol_filtered = df[df['Protocol']== 'ICMP']
Protocol_filtered1 = Protocol_filtered[['Time','Source','Destination','Info']]
Protocol_filtered1 = Protocol_filtered1.reset_index(drop=True)
I start checking for the lost packages
s0 = 0
s1 = 1
for row in Protocol_filtered1.iterrows():
while s1 <= len (Protocol_filtered1):
source = Protocol_filtered1.loc[s0,'Source']
dest = Protocol_filtered1.loc[s1,'Destination']
if source == dest:
Protocol_filtered1['Check']= True
else:
Protocol_filtered1['Check']= False
source1 = Protocol_filtered1.loc[s1,'Source']
dest1 = Protocol_filtered1.loc[s0,'Destination']
if source1 == dest1:
Protocol_filtered1['Check1']= True
else:
Protocol_filtered1['Check1']= False
s0 = s0 + 2
s1 = s1 + 2
The result of this code is not the result I wanted, as it gives me for example a true in the row 2 where it should be false.
The logic of the following code is correct, but it checks i for every row, while it should always check the two successive rows together (0 & 1, 2&3 ,4&5 ...):
pattern = ['192.168.20.35', '192.168.20.31']
i = (Protocol_filtered1['Source'] == '192.168.20.35') & (Protocol_filtered1['Source'].shift(-1) == '192.168.20.31')
i &= (Protocol_filtered1['Destination'] == '192.168.20.31') & (Protocol_filtered1['Destination'].shift(-1)== '192.168.20.35')
Protocol_filtered1.index[i]
Protocol_filtered1 ['Check1'] = i
The result here is (It should be: Check: True, True, False, False, True, True):
A very elegant solution I found in the forum and that I tried to apply is:
pattern = ['192.168.20.35', '192.168.20.31']
obs = len(pattern)
Protocol_filtered1['S1'] = (Protocol_filtered1['Source']
.rolling(window = obs, min_periods = )
.apply(lambda x: (x==pattern).all())
.astype(bool)
.shift(-1*(obs-1)))
But there also seems to be a problem in my code. I prefer the last solution where I can define a certain pattern and the size of the window and let pandas go over all the dataframe where I then can count the amount of lost packages with isnull().
I would really appreciate some help! Thank you very much!