How to sort the DataFrame by two columns in Python?

Question

I have the dataframe df (TCP packets) including four columns server, client, seq, ack. For example,

server    client    seq         ack
A         B         207876062   2372538506
A         B         207876089   2372538616
B         A         2372538590  207876089
A         B         207876062   2372538590
B         A         2372538506  207876062

I would like to sort by column seq and ack successively:

server    client    seq       ack
A         B         207876062   2372538506
B         A         2372538506  207876062
A         B         207876062   2372538590
B         A         2372538590  207876089
A         B         207876089   2372538616

Is there any method to sort in the correct order?

Thanks

hi,@Sushanth! Ya, I tried the df.sort_values, but what I want is to let the seq in n+1 row be the ack in n row, and that was quite different from the question of sorting in ascending or descending, thank you! — char, Jul 26 '20 at 06:55
Can you show some realistic data instead of these hand-written "random numbers"? — John Zwinck, Jul 26 '20 at 07:17
There are two rows with the same seq; how would you know which to choose? — Itamar Mushkin, Jul 26 '20 at 07:53
ya! I had the same question when I got the data, and I think it does not matter afterwards. It would work fine if seq and ack could match. thanks!! — char, Jul 26 '20 at 08:01

blondelg · Answer 1 · 2020-07-26T09:53:42.560

Here is what I would do considering df is the dataframe you want to process:

# Step 1 split dataframes between two sub-dataframes
df_a = df[df['server'] == 'A']
df_b = df[df['server'] == 'B']

# Step 2 sorting sub-dataframes by fields seq and ack
df_a = df_a.sort_values(by=['seq', 'ack'])
df_b = df_b.sort_values(by=['seq', 'ack'])

# Step 3 adding a sorting key
df_a['sorting_key'] = range(1, df_a.shape[0] + 1)
df_b['sorting_key'] = range(1, df_b.shape[0] + 1)

# Step 4 shifting the sorting key for the second dataframe
df_b['sorting_key'] = df_b['sorting_key'].apply(lambda x: x + 0.5)

# Step 5 Concatenate the two dataframe and sorting them by the sorting key
df_c = pd.concat([df_a, df_b]).sort_values(by=['sorting_key'])

# Step 6 Clean up a bit the result
df_c = df_c.reset_index(drop=True).drop(['sorting_key'], axis=1)

UPDATE

If you don't know how many servers there is, just add loops as follow:


# Step 1 split dataframes between two sub-dataframes
sub_df = []
for e in set(df['server']):
    sub_df.append(df[df['server'] == e])

# Step 2 sorting sub-dataframes by fields seq and ack and adding a sorting key
sub_df_1 = []
for tdf in sub_df:
    tdf = tdf.sort_values(by=['seq', 'ack'])
    tdf['sorting_key'] = range(1, tdf.shape[0] + 1)
    sub_df_1.append(tdf)

# Step 3 shifting the sorting key for the second dataframe
sub_df_2 = [sub_df_1[0]]
delta = 0.1
for tdf in sub_df_1[1:]:
    tdf['sorting_key'] = tdf['sorting_key'].apply(lambda x: x + delta)
    delta += delta / 10
    sub_df_2.append(tdf)

# Step 4 Concatenate the two dataframe and sorting them by the sorting key
df_c = pd.concat(sub_df_2).sort_values(by=['sorting_key'])

# Step 5 Clean up a bit the result
df_c = df_c.reset_index(drop=True).drop(['sorting_key'], axis=1)

Good luck

thanks,@blondelg! It helps and gives me a new way of thinking. — char, Jul 26 '20 at 13:52
you're welcome! Don't hesitate to validate my answer if it solved your issue — blondelg, Jul 26 '20 at 14:08

How to sort the DataFrame by two columns in Python?

1 Answers1