2

I'm having trouble formatting a dataframe in a specific style. I want to have data pertaining to one S/N all clumped together. My ultimate goal with the dataset is to plot Dis vs Rate for all the S/Ns. I've tired iterating over rows to slice data but that hasnt worked. What would be the best(easiest) approach for this formatting. Thanks!

For example: S/N 332 has Dis 4.6 and Rate of 91.2 in the first row, immediately after that I want it to have S/N 332 with Dis 9.19 and Rate 76.2 and so on for all rows with S/N 332.

    S/N      Dis       Rate
0   332   4.6030  91.204062
1   445   5.4280  60.233917
2   999   4.6030  91.474156
3   332   9.1985  76.212943
4   445   9.7345  31.902842
5   999   9.1985  76.212943
6   332  14.4405  77.664282
7   445  14.6015  36.261851
8   999  14.4405  77.664282
9   332  20.2005  76.725955
10  445  19.8630  40.705467
11  999  20.2005  76.725955
12  332  25.4780  31.597510
13  445  24.9050   4.897008
14  999  25.4780  31.597510
15  332  30.6670  74.096975
16  445  30.0550  35.217889
17  999  30.6670  74.096975

Edit: Tried using sort as @Ian Kenney suggested but that doesn't help because now the Dis values are no longer in the ascending order:

0     332    4.6030    91.204062
15    332   30.6670    74.096975
3     332    9.1985    76.212943
6     332   14.4405    77.664282
9     332   20.2005    76.725955
12    332   25.4780    31.597510
1     445    5.4280    60.233917
4     445    9.7345    31.902842
7     445   14.6015    36.261851
16    445   30.0550    35.217889
10    445   19.8630    40.705467
13    445   24.9050     4.897008
tdy
  • 36,675
  • 19
  • 86
  • 83
LeafTeaNeko
  • 113
  • 1
  • 13

2 Answers2

5

Use sort_values, which can accept a list of sorting targets. In this case it sounds like you want to sort by S/N, then Dis, then Rate:

df = df.sort_values(['S/N', 'Dis', 'Rate'])

#     S/N      Dis       Rate
# 0   332   4.6030  91.204062
# 3   332   9.1985  76.212943
# 6   332  14.4405  77.664282
# 9   332  20.2005  76.725955
# 12  332  25.4780  31.597510
# 15  332  30.6670  74.096975
# 1   445   5.4280  60.233917
# 4   445   9.7345  31.902842
# 7   445  14.6015  36.261851
# 10  445  19.8630  40.705467
# 13  445  24.9050   4.897008
# 16  445  30.0550  35.217889
# ...
tdy
  • 36,675
  • 19
  • 86
  • 83
1

You can also achieve this by several ways, another way from the already existing answer is,

df.sort_values(by = ['S/N', "Dis", 'Rate'], inplace = True)
df

Output:

    S/N Dis     Rate
0   332 4.6030  91.204062
3   332 9.1985  76.212943
6   332 14.4405 77.664282
9   332 20.2005 76.725955
12  332 25.4780 31.597510
15  332 30.6670 74.096975
1   445 5.4280  60.233917
4   445 9.7345  31.902842
7   445 14.6015 36.261851
10  445 19.8630 40.705467
13  445 24.9050 4.897008
16  445 30.0550 35.217889
2   999 4.6030  91.474156
5   999 9.1985  76.212943
8   999 14.4405 77.664282
11  999 20.2005 76.725955
14  999 25.4780 31.597510
17  999 30.6670 74.096975

Here, the Inplace argument used within the sort_values function directly make the changes in the source dataframe which will eliminate the need to create another dataframe to store the sorted output.

  • 1
    Note that [`inplace` is not recommended](https://stackoverflow.com/a/60020384/13138364) and will be [deprecated soon](https://github.com/pandas-dev/pandas/issues/16529). In fact it still creates another dataframe, so the name is misleading. – tdy Jan 11 '22 at 01:41