1

The DataFrame:

import pandas as pd
import numpy as np

df = pd.DataFrame({'col1': [2, 6, 7, 8, 2]})
col1
0 2
1 6
2 7
3 8
4 2

A function to create a new column being a cumulative sum (I'm aware of cumsum(), this was just a simple test before doing more complex functions):

def funcadd(inputarr):
    for i in range(1, len(inputarr)):
        inputarr[i] = inputarr[i] + inputarr[i-1]
    return inputarr

df['new'] = funcadd(df['col1'].values)
col1 new
0 2 2
1 8 8
2 15 15
3 23 23
4 25 25

As you can see, for some reason col1 gets modified despite never issuing a command to change it (?)

I've tried:

  • immediately doing arr1 = inputarr in the function then only using arr1 in the rest of the function
  • doing arr2 = df['col1'].values before calling the function with arr2
  • swapping .values with .to_numpy
Amir Afianian
  • 2,679
  • 4
  • 22
  • 46
yes
  • 241
  • 2
  • 11
  • 1
    try `df['new']=funcadd(df['col1'].copy())` and col1 get modified is due to the fact that you are passing shallow copy of `df['col1']` into function for more info regarding copies refer to [What is the difference between shallow copy, deepcopy and normal assignment operation?](https://stackoverflow.com/questions/17246693/what-is-the-difference-between-shallow-copy-deepcopy-and-normal-assignment-oper) – Anurag Dabas Jun 19 '21 at 06:08

1 Answers1

3

There are better ways to do the cumulative sum in Pandas, as shown in the other answer but just to solve the issue here.

In this line

inputarr[i] = inputarr[i] + inputarr[i-1]

You are updating the values back to the original one, and since you haven't created the copy, it is writing back to the original dataframe.

if you want to avoid this, just create a copy, like this.

def funcadd(inputarr):
    c_inputarr = inputarr.copy()
    for i in range(1, len(inputarr)):
        c_inputarr[i] = c_inputarr[i] + c_inputarr[i-1]
    return c_inputarr

df['new'] = funcadd(df['col1'].values)

which returns

col1    new
0   2   2
1   6   8
2   7   15
3   8   23
4   2   25
Amit Gupta
  • 2,698
  • 4
  • 24
  • 37
  • how come `df['col1'].values = 0` doesn't work? do i only need to do `.copy()` if i'm passing it to a function? – yes Jun 19 '21 at 06:21
  • do you want to assign 0 to all `df['col1'].values` – Amit Gupta Jun 19 '21 at 06:23
  • no, just wondering why i can't change the `df['col1'].values` outside the function but can when inside a function. just realised `df['col1'].values = 0` doesnt work inside function either, but for instance `df['col1'].values += 1` doesnt work outside the function, but if you pass it to a function as the arg `input` you can change it with `input += 1` – yes Jun 19 '21 at 06:30