2

I am currently trying to analyze network data with pandas. I have been reading other posts and the closest to my problem is Pandas - Find and index rows that match row sequence pattern.

My dataframe looks like this: Dataframe

I am trying to check if some of the packages are lost and count the amount of lost packages. Therefore, I would like to define a window or matrix, here 2x2. Then define a pattern, in this case it would be .

Now I want to check if the window is exactly a recurring window. If possible this should be done in an extra column giving me false or true (or nan). I tried this already in the following examples of my code.

In the first example i tried to check it with iterating the rows. My third example is more what I was looking for: With the rolling command I define a window and a pattern the code should check the rows for, but I get an error, because the pattern is a string. This is what I would like it to look like.

import pandas as pd

df = pd.read_csv('hallo')

Here I am filtering out the interferences

   Protocol_filtered = df[df['Protocol']== 'ICMP']
   Protocol_filtered1 = Protocol_filtered[['Time','Source','Destination','Info']] 
   Protocol_filtered1 = Protocol_filtered1.reset_index(drop=True)

I start checking for the lost packages

    s0 = 0
    s1 = 1

   for row in Protocol_filtered1.iterrows():
  while s1 <= len (Protocol_filtered1):
    source = Protocol_filtered1.loc[s0,'Source']
    dest = Protocol_filtered1.loc[s1,'Destination']

    if source == dest:
        Protocol_filtered1['Check']= True
    else:
        Protocol_filtered1['Check']= False
    
    source1 = Protocol_filtered1.loc[s1,'Source']
    dest1 = Protocol_filtered1.loc[s0,'Destination']
    


    if source1 == dest1:
        Protocol_filtered1['Check1']= True
    else:
        Protocol_filtered1['Check1']= False

    s0 = s0 + 2
    s1 = s1 + 2  

The result of this code is not the result I wanted, as it gives me for example a true in the row 2 where it should be false.

Protocol_filtered1.head()

The logic of the following code is correct, but it checks i for every row, while it should always check the two successive rows together (0 & 1, 2&3 ,4&5 ...):

pattern = ['192.168.20.35', '192.168.20.31']
i = (Protocol_filtered1['Source'] == '192.168.20.35') &         (Protocol_filtered1['Source'].shift(-1) == '192.168.20.31')
i &= (Protocol_filtered1['Destination'] == '192.168.20.31') & (Protocol_filtered1['Destination'].shift(-1)== '192.168.20.35')

Protocol_filtered1.index[i]

Protocol_filtered1 ['Check1'] = i

The result here is (It should be: Check: True, True, False, False, True, True):

enter image description here

A very elegant solution I found in the forum and that I tried to apply is:

pattern = ['192.168.20.35', '192.168.20.31']
obs = len(pattern)
Protocol_filtered1['S1'] = (Protocol_filtered1['Source']
                        .rolling(window = obs, min_periods = )
                        .apply(lambda x: (x==pattern).all())
                        .astype(bool)
                        .shift(-1*(obs-1)))

But there also seems to be a problem in my code. I prefer the last solution where I can define a certain pattern and the size of the window and let pandas go over all the dataframe where I then can count the amount of lost packages with isnull().

I would really appreciate some help! Thank you very much!

Community
  • 1
  • 1
S. Tudent
  • 21
  • 5
  • 2
    Hi! Your question is not very clear. Perhaps you can try to rephrase to make it clear what you are looking for, an example of the expected result and (important) what you have tried so far. The idea would be to help you with a single problem at a time ;) – Luis Oct 30 '18 at 18:00
  • I hope it is clearer now. – S. Tudent Oct 31 '18 at 09:33
  • Why did you tag it with the `d` tag? It has nothing to do with the D programming language... – DejanLekic Nov 01 '18 at 11:32
  • sorry, it was by accident! – S. Tudent Nov 01 '18 at 17:59
  • If this is still an issue, could you clarify the problem seen with the `.rolling` approach? (Seems like you want to leave min_periods at the default (i.e., window size).) – Garrett Nov 21 '18 at 12:26

0 Answers0