0
import numpy as np
import pandas as pd

test_data = pd.DataFrame(
    dict(
        value=np.random.rand(9) - 0.5,
        group=np.repeat(np.arange(3), 3),
        time_idx=np.tile(np.arange(3), 3),
    )
)
test_data
train_data = pd.DataFrame(
    dict(
        value=np.random.rand(9) - 0.5,
        group=np.repeat(np.arange(3), 3),
        time_idx=np.tile(np.arange(3), 3),
    )
)
train_data

Why does this assignment:

for data in [train_data, test_data]:
    data = data.sort_values('value')

not leave either of train_data or test_data sorted?

Doing the assignment outside the loop works just fine. Like so:

train_data = train_data.sort_values('value')

Doing an inplace operation inside the for loop works as well:

for data in [train_data, test_data]:
    data.sort_values('value', inplace=True)
FabianH
  • 3
  • 2

1 Answers1

0

When you do:

for data in [train_data, test_data]:
    data = data.sort_values('value')

You take first train_data, sort it and assignt it as a copy to data. Later the same with test_data. Then, neither train_data or test_data are edited and the final data is the sorted version of test_data because the for-loop reassign the object.

Like if you do:

data = train_data 
data.sort_values('value',inplace=True)
data = test_data
data.sort_values('value',inplace=True)

When you do:

for data in [train_data, test_data]:
    data.sort_values('value', inplace=True)

The inplace=True is editing the object data which is also a reference to train_data first and test_data second. So they are edited too. And data is also test_data.

It is like you do:

train_data.sort_values('value',inplace=True)
data = train_data 
test_data.sort_values('value',inplace=True)
data = test_data

The assignment data = in the for-loop works perfectly. If you actually want to sort train_data and test_data the inplace option is the one you are looking for.

RobertoT
  • 1,663
  • 3
  • 12