The goal of my question is to understand why this happens and if this is a defined behaviour. I need to know to design my unittests in a predictable way. I do not want or need to change that behaviour or work around it.
Here is the initial data on the left side complete and on the right side just all ID.eq(1)
but the order is the same as you can see in the index and the val
column.
| | ID | val | | | ID | val |
|---:|-----:|:------| |---:|-----:|:------|
| 0 | 1 | A | | 0 | 1 | A |
| 1 | 2 | B | | 3 | 1 | x |
| 2 | 9 | C | | 4 | 1 | R |
| 3 | 1 | x | | 6 | 1 | G |
| 4 | 1 | R | | 9 | 1 | a |
| 5 | 4 | F | | 12 | 1 | d |
| 6 | 1 | G | | 13 | 1 | e |
| 7 | 9 | H |
| 8 | 4 | I |
| 9 | 1 | a |
| 10 | 2 | b |
| 11 | 9 | c |
| 12 | 1 | d |
| 13 | 1 | e |
| 14 | 4 | f |
| 15 | 2 | g |
| 16 | 9 | h |
| 17 | 9 | i |
| 18 | 4 | X |
| 19 | 5 | Y |
This right table is also the result I would expected when doing the following:
When I sort by ID
the order of the rows inside the subgroups (e.g. ID.eq(1)
) is modified. Why is it so?
This is the unexpected result
| | ID | val |
|---:|-----:|:------|
| 0 | 1 | A |
| 13 | 1 | e |
| 12 | 1 | d |
| 6 | 1 | G |
| 9 | 1 | a |
| 3 | 1 | x |
| 4 | 1 | R |
This is a full MWE
#!/usr/bin/env python3
import pandas as pd
# initial data
df = pd.DataFrame(
{
'ID': [1, 2, 9, 1, 1, 4, 1, 9, 4, 1,
2, 9, 1, 1, 4, 2, 9, 9, 4, 5],
'val': list('ABCxRFGHIabcdefghiXY')
}
)
print(df.to_markdown())
# only the group "1"
print(df.loc[df.ID.eq(1)].to_markdown())
# sort by 'ID'
df = df.sort_values('ID')
# only the group "1" (after sorting)
print(df.loc[df.ID.eq(1)].to_markdown())