why sort_values() is diifferent form sort_values().values

Question

I want to sort a dataframe by all columns,and I find a way to solve that using

df = df.apply( lambda x: x.sort_values())

and I used it to my data

text1 = text
text = text.apply( lambda x : x.sort_values())
text1 = text1.apply( lambda x : x.sort_values().values)
text.head()
text1.head()

why not text = text.apply( lambda x : x.sort_values()) get a wrong answer,and what is the .vaules)function?

text.head()
    Wave    2881.394531 2880.574219 2879.75293  2878.931641 2878.111328
    N-1     0.220934    0.203666    0.205743    0.196011    0.176293
    N-10    0.432692    0.387074    0.395692    0.355331    0.358963
    N-11    0.483360    0.463233    0.456304    0.428930    0.421482
    N-12    0.365057    0.364417    0.385134    0.352451    0.350513
    N-13    0.492172    0.466263    0.480657    0.439115    0.404883


text1.head()
    Wave    2881.394531 2880.574219 2879.75293  2878.931641 2878.111328
    P+1    -21.297623   -25.141329  -21.097095  -31.380476  -38.847958
    P+2    -12.681051   -14.661134  -13.688742  -16.829298  -20.320133
    P+3    -8.164744    -13.097990  -11.784309  -15.419610  -17.822252
    P+4    -0.023353    -0.926852   -8.036203   -14.583183  -17.071484
    P+5     0.022854    -0.037756   -0.002519   -1.891178   -7.795961

score 2 · Answer 1 · answered Nov 14 '18 at 03:36

By default, Pandas operations align data based on their index. So consider for example

In [19]: df = pd.DataFrame([(10,1),(9,2),(8,3),(7,4)], index=list('ABDC'))

In [20]: df
Out[20]: 
    0  1
A  10  1
B   9  2
D   8  3
C   7  4

When Pandas evaluates df.apply(lambda x: x.sort_values()), it generates the Series:

In [24]: df[0].sort_values()
Out[24]: 
C     7
D     8
B     9
A    10
Name: 0, dtype: int64

In [25]: df[1].sort_values()
Out[25]: 
A    1
B    2
D    3
C    4
Name: 1, dtype: int64

and then tries to combine these two Series into a resultant DataFrame. It does that by aligning the indices:

In [21]: df.apply(lambda x: x.sort_values())   
Out[21]: 
    0  1
A  10  1
B   9  2
C   7  4
D   8  3

In contrast, when the lambda function returns a NumPy array there is no index to align upon. So Pandas merely pastes the values from the NumPy array into a resultant DataFrame in the same order.

So, when Pandas evaluates df.apply(lambda x: x.sort_values().values), it generates the NumPy arrays:

In [26]: df[0].sort_values().values
Out[26]: array([ 7,  8,  9, 10])

In [27]: df[1].sort_values().values
Out[27]: array([1, 2, 3, 4])

and then tries to combine these two NumPy arrays into a resultant DataFrame with the values in the same order

In [28]: df.apply(lambda x: x.sort_values().values)   
Out[28]: 
    0  1
A   7  1
B   8  2
D   9  3
C  10  4

That is a correct and detailed answer, thank you very much ！ — X.tang, Nov 14 '18 at 08:41

score 0 · Answer 2 · answered Nov 14 '18 at 03:26

Welcome to StackOverflow!

Based on pandas documentation, sort_values() return the DataFrame object itself, while values() return the numpy array representation of the values in the DataFrame. Since apply() applies the specified function across the axis of the DataFrame, the applied function must return the numpy array representation of that current row/column, instead of returning the whole DataFrame. That is why it gives you the wrong result when you are only using sort_values().

You can read the more complete explanation at sort_values() documentation, values() documentation, and apply() documentation

why sort_values() is diifferent form sort_values().values

2 Answers2

Linked