0

I have the dataframe df (TCP packets) including four columns server, client, seq, ack. For example,

server    client    seq         ack
A         B         207876062   2372538506
A         B         207876089   2372538616
B         A         2372538590  207876089
A         B         207876062   2372538590
B         A         2372538506  207876062

I would like to sort by column seq and ack successively:

server    client    seq       ack
A         B         207876062   2372538506
B         A         2372538506  207876062
A         B         207876062   2372538590
B         A         2372538590  207876089
A         B         207876089   2372538616

Is there any method to sort in the correct order?

Thanks

char
  • 3
  • 2
  • hi,@Sushanth! Ya, I tried the df.sort_values, but what I want is to let the seq in n+1 row be the ack in n row, and that was quite different from the question of sorting in ascending or descending, thank you! – char Jul 26 '20 at 06:55
  • 1
    Can you show some realistic data instead of these hand-written "random numbers"? – John Zwinck Jul 26 '20 at 07:17
  • hi,@JohnZwinck. I have edited it, thank you. – char Jul 26 '20 at 07:38
  • There are two rows with the same seq; how would you know which to choose? – Itamar Mushkin Jul 26 '20 at 07:53
  • ya! I had the same question when I got the data, and I think it does not matter afterwards. It would work fine if seq and ack could match. thanks!! – char Jul 26 '20 at 08:01

1 Answers1

0

Here is what I would do considering df is the dataframe you want to process:

# Step 1 split dataframes between two sub-dataframes
df_a = df[df['server'] == 'A']
df_b = df[df['server'] == 'B']

# Step 2 sorting sub-dataframes by fields seq and ack
df_a = df_a.sort_values(by=['seq', 'ack'])
df_b = df_b.sort_values(by=['seq', 'ack'])

# Step 3 adding a sorting key
df_a['sorting_key'] = range(1, df_a.shape[0] + 1)
df_b['sorting_key'] = range(1, df_b.shape[0] + 1)

# Step 4 shifting the sorting key for the second dataframe
df_b['sorting_key'] = df_b['sorting_key'].apply(lambda x: x + 0.5)

# Step 5 Concatenate the two dataframe and sorting them by the sorting key
df_c = pd.concat([df_a, df_b]).sort_values(by=['sorting_key'])

# Step 6 Clean up a bit the result
df_c = df_c.reset_index(drop=True).drop(['sorting_key'], axis=1)

UPDATE

If you don't know how many servers there is, just add loops as follow:


# Step 1 split dataframes between two sub-dataframes
sub_df = []
for e in set(df['server']):
    sub_df.append(df[df['server'] == e])

# Step 2 sorting sub-dataframes by fields seq and ack and adding a sorting key
sub_df_1 = []
for tdf in sub_df:
    tdf = tdf.sort_values(by=['seq', 'ack'])
    tdf['sorting_key'] = range(1, tdf.shape[0] + 1)
    sub_df_1.append(tdf)

# Step 3 shifting the sorting key for the second dataframe
sub_df_2 = [sub_df_1[0]]
delta = 0.1
for tdf in sub_df_1[1:]:
    tdf['sorting_key'] = tdf['sorting_key'].apply(lambda x: x + delta)
    delta += delta / 10
    sub_df_2.append(tdf)

# Step 4 Concatenate the two dataframe and sorting them by the sorting key
df_c = pd.concat(sub_df_2).sort_values(by=['sorting_key'])

# Step 5 Clean up a bit the result
df_c = df_c.reset_index(drop=True).drop(['sorting_key'], axis=1)

Good luck

blondelg
  • 916
  • 1
  • 8
  • 25