4

I dont understand what is the best practice here:

I want to modify dataframe data in my function. data is defined globally. However, if I specify the global option in the function, I necessarily get an error because data = defines a local variable.

data = pd.DataFrame({'A' : [1, 2, 3, 4],
                     'B' : [1, 2, 3, 4]})

def test(data):
    global data
    data =  data + 1
    return data

test(data) 
SyntaxError: name 'data' is local and global

Does that mean I cannot use the global argument when working with dataframes?

def test2(data):
    data =  data + 1
    return data

does not work either. That is the original data is not modified.

What am I missing here?

Zeugma
  • 31,231
  • 9
  • 69
  • 81
ℕʘʘḆḽḘ
  • 18,566
  • 34
  • 128
  • 235
  • You dont need Global variable if you are returning the same object, Just comment that part and run the pgm – Shijo Feb 08 '17 at 15:39

1 Answers1

17

If you want to act on the global data in your function, don't pass it as a parameter:

import pandas as pd

data = pd.DataFrame({'A' : [1, 2, 3, 4],
                     'B' : [1,2,3,4]})
def test():
    global data
    data =  data + 1

test()

Another option would be to keep the parameter and assign the result of calling the function:

import pandas as pd

data = pd.DataFrame({'A' : [1, 2, 3, 4],
                     'B' : [1,2,3,4]})

def test(data):
   data =  data + 1
   return data

data = test(data)

You can see that using the same name for both the global and local variables makes things a bit confusing. If you want to go that route, using different names could make it a bit easier on the brain:

import pandas as pd

g_data = pd.DataFrame({'A' : [1, 2, 3, 4],
                       'B' : [1,2,3,4]})

def test(data):
    data =  data + 1
    return data

g_data = test(g_data)
jas
  • 10,715
  • 2
  • 30
  • 41
  • interesting. but then I dont get why the `return` is needed here. Isnt the dataframe modified inplace at the line `data = data + 1`? – ℕʘʘḆḽḘ Feb 08 '17 at 15:42
  • 2
    Correct, it's not needed. – jas Feb 08 '17 at 15:43
  • that actually raises an interesting question. if you do keep the `return` does the data is duplicated in memory? That could crash your computer when working with large datasets. What do you think? – ℕʘʘḆḽḘ Feb 08 '17 at 15:47
  • 2
    I don't think there would be any problem with memory in that case because really what gets returned is a reference to the object (essentially a pointer). It's good to think about stuff like that, though! – jas Feb 08 '17 at 15:51