2

Can anyone provide an explanation of what it means to have a "slice" vs. a "copy" in pandas? I've been working with pandas for a while, and have internalized some rules of thumb about how to avoid the warnings.

But a colleague had some weird behavior today that I think is traceable to the same distinction, and it made me realize that I don't really understand what's going on under the hood and how it plays out in different situation. I'd love an explanation!

--

Today's example:

def func(df):
    df.sort('sort_col',inplace=True)
    some other stuff
    return modified_df

grouped = df.groupby('col1')
result = grouped.apply(func)

func ended up returning the same modified_df each time, until we changed the sort to df2 = df.sort('sort_col').copy(). I think this has to do with "By default the group keys are sorted during the groupby operation" from the pandas docs....but I'm confused about what exactly is happening.

Thanks!

Paritosh Singh
  • 6,034
  • 2
  • 14
  • 33
exp1orer
  • 11,481
  • 7
  • 38
  • 51

2 Answers2

6

Slices are views on the original data. Any modifications to the view will be reflected in the source array. For example, if this is your array :

    my_array=array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10])

If you take a slice and assign a scalar value it will change the original array.

    my_array[4:6]=12
    print my_array
    [ 0  1  2  3 12 12  6  7  8  9 10]

However, you can take copy of the slice instead of view, to prevent modifying the original array.

    my_array[4:6].copy()

Hope this helps.

Amrita Sawant
  • 10,403
  • 4
  • 22
  • 26
1

A slice means that you are operating (or can operate) on the underlying object itself, so any changes made to it will be reflected in that object. A copy is, well, simply a copy. Any changes on a copy will not be reflected in the underlying object.

See this post for more info.

Community
  • 1
  • 1
Alexander
  • 105,104
  • 32
  • 201
  • 196