-1

I could not reference the original dataframe in the other function, i.e. print_df_in_a_function(). Could anyone please advise the mistake I made here?

It displays None upon calling print_df_in_a_function().

import numpy as np
import pandas as pd

df1 = None

def main():
    df1 = pd.DataFrame(np.array([['a', 5, 9],
                                 ['b', 4, 61],
                                 ['c', 24, 9]]),
                                 columns=['name', 'attr1', 'attr2'])

    print(df1)
    print_df_in_a_function()

def print_df_in_a_function():
    print(df1)

if __name__ == '__main__':
    main()
valiano
  • 16,433
  • 7
  • 64
  • 79
Roy
  • 507
  • 10
  • 22
  • instead of `print()` use `return` there. And then in your `main` assign `some_value` to the the function call. And print that value. Functions by default return `None` type – void Jul 03 '17 at 04:38
  • 1
    @s_vishnu that isn't the issue. The issue is that `df1` inside `print_df_in_a_function` resolves to the *global* `df1`, not the `df1` local to `main`. This is because Python has lexical scoping instead of dynamic scoping (which would print the `df1` defined in `main`. – juanpa.arrivillaga Jul 03 '17 at 04:39
  • Oh yes man. Just now ran the code. Got it – void Jul 03 '17 at 04:39
  • Well in that case you need to pass df1 to function to print out what you want Otherwise you will just return `None` – void Jul 03 '17 at 04:41
  • @AmiTavory actually, I don't think this is an exact duplicate. The issue here would not be solved with the `global` directive being used in `print_df_in_a_function`. The problem is that the OP assumes that Python has dynamic scoping instead of lexical scope. – juanpa.arrivillaga Jul 03 '17 at 04:44

3 Answers3

2

The issue is that df1 inside print_df_in_a_function resolves to the global df1, not the df1 local to main. This is because Python has lexical scoping instead of dynamic scoping. From wikipedia

A fundamental distinction in scoping is what "part of a program" means. In languages with lexical scope (also called static scope), name resolution depends on the location in the source code and the lexical context, which is defined by where the named variable or function is defined. In contrast, in languages with dynamic scope the name resolution depends upon the program state when the name is encountered which is determined by the execution context or calling context. In practice, with lexical scope a variable's definition is resolved by searching its containing block or function, then if that fails searching the outer containing block, and so on, whereas with dynamic scope the calling function is searched, then the function which called that calling function, and so on, progressing up the call stack.[4] Of course, in both rules, we first look for a local definition of a variable.

If Python did use dynamic scoping, it would work as you intended. Instead, because of lexical scoping, instead we see this behavior:

In [1]: import numpy as np
   ...: import pandas as pd
   ...:
   ...: df1 = None
   ...:
   ...: def main():
   ...:     df1 = pd.DataFrame(np.array([['a', 5, 9],
   ...:                                  ['b', 4, 61],
   ...:                                  ['c', 24, 9]]),
   ...:                                  columns=['name', 'attr1', 'attr2'])
   ...:
   ...:     print(df1)
   ...:     print_df_in_a_function()
   ...:
   ...: def print_df_in_a_function():
   ...:     print(df1)
   ...:
   ...: if __name__ == '__main__':
   ...:     main()
   ...:
  name attr1 attr2
0    a     5     9
1    b     4    61
2    c    24     9
None

Note that if we move the definition of print_df_in_a_function, the name resolves to the df1 inside main:

In [3]: import numpy as np
   ...: import pandas as pd
   ...:
   ...: df1 = None
   ...:
   ...: def main():
   ...:     def print_df_in_a_function():
   ...:         print(df1)
   ...:     df1 = pd.DataFrame(np.array([['a', 5, 9],
   ...:                                  ['b', 4, 61],
   ...:                                  ['c', 24, 9]]),
   ...:                                  columns=['name', 'attr1', 'attr2'])
   ...:
   ...:     print(df1)
   ...:     print_df_in_a_function()
   ...:
   ...: if __name__ == '__main__':
   ...:     main()
   ...:
  name attr1 attr2
0    a     5     9
1    b     4    61
2    c    24     9
  name attr1 attr2
0    a     5     9
1    b     4    61
2    c    24     9

Because when trying to resolve a name, Python first checks the local scope (local to your print_df_in_a_function). Then if it doesn't find it, it checks any containing scope. In this case, the scope of main has a df1, so the name resolution ends there. However, if you delete the name df1 in main, it still finds the global df1:

In [5]: import numpy as np
   ...: import pandas as pd
   ...:
   ...: df1 = None
   ...:
   ...: def main():
   ...:     df1 = pd.DataFrame(np.array([['a', 5, 9],
   ...:                                  ['b', 4, 61],
   ...:                                  ['c', 24, 9]]),
   ...:                                  columns=['name', 'attr1', 'attr2'])
   ...:
   ...:     print(df1)
   ...:     del df1
   ...:     print_df_in_a_function()
   ...:
   ...: def print_df_in_a_function():
   ...:     print(df1)
   ...:
   ...: if __name__ == '__main__':
   ...:     main()
   ...:
  name attr1 attr2
0    a     5     9
1    b     4    61
2    c    24     9
None
juanpa.arrivillaga
  • 88,713
  • 10
  • 131
  • 172
1

For this you need to understand how variable scope works. Take a look at this!

def my_func():
    index3 =5000
    print(index3)

index3=10;
print(index3)
my_func()

output:

10
5000

Note: Even though there are two index3 you might think they are the same. But they are NOT

The index3 within the my_func is a local variable. While the one in your program (the one not in the function) index3 is different!. So what happens in the above code is that first print(index3) prints the index3 in my code (not in any functions..just in my program) then my_func() gets called and print(index3) within my_func() prints the local variable index3

Take a look at this!

def my_func():
    print(index3)

index3=10;
print(index3)
my_func()

output:

10
10

See now both times the index3 which is same 10 this means it prints the global variable two times.

Now look:

def my_func():
    index3 =index3+1

index3=10;
print(index3)
my_func()

output:

10
Traceback (most recent call last):
  File "/home/mr/func.py", line 6, in <module>
    my_func()
  File "/home/mr/func.py", line 2, in my_func
    index3 =index3+1
UnboundLocalError: local variable 'index3' referenced before assignment

Why?

Because of this index3 =index3+1 So the moment it sees a index3= it creates a local variable. So index3=0 means assign 0 to local variable.

However index3 =index3+1 would confuse it! It thinks

Wait you want me to assign local variable index3 as local variable index3+1 ? But you haven't even declared it yet!

def my_func():
    global index3
    index3 =index3+1
    print(index3)

index3=10
print(index3)
my_func()
print(index3)

output:

10
11
11

Now it takes the global value within the function and it changes. So index3 is changed by the function.

NOTE: Using global variables is bad coding practice.

def getIndex3():
    return index3

def my_func():
    index3 = getIndex3()
    index3 =index3+1
    print(index3)

index3=10
print(index3)
my_func()
print(index3)

Now output:

10
11
10

So in your case.

def print_df_in_a_function():
    print(df1)

This just resolves too the df1=None in your program (Globally at the top). And doesn't mean the df1 your main.

However you can achieve what you want by passing the df1 (in main) to your

print_df_in_a_function(df1)

Now what happens is the df1 having value (your dataframe) will be passed to your print_df_in_a_function(df1): and now you can print the value. Like this,

import numpy as np
import pandas as pd

df1 = None

def main():
    df1 = pd.DataFrame(np.array([['a', 5, 9],
                                 ['b', 4, 61],
                                 ['c', 24, 9]]),
                                 columns=['name', 'attr1', 'attr2'])

    print(df1)
    print_df_in_a_function(df1)

def print_df_in_a_function(df1):
    print(df1)

if __name__ == '__main__':
    main()

Output:

  name attr1 attr2
0    a     5     9
1    b     4    61
2    c    24     9
  name attr1 attr2
0    a     5     9
1    b     4    61
2    c    24     9
>>> 
void
  • 2,571
  • 2
  • 20
  • 35
0

I took a more simpler example and the global directive does solve the problem.

df1 = None

def main():
    global df1 #This is the magic line that references the global variable
    df1 = 1
    print(df1)
    print_df_in_a_function()

def print_df_in_a_function():
    print(df1)

main()

Your code would be:

import numpy as np
import pandas as pd

df1 = None

def main():
    global df1
    df1 = pd.DataFrame(np.array([['a', 5, 9],
                                 ['b', 4, 61],
                                 ['c', 24, 9]]),
                                 columns=['name', 'attr1', 'attr2'])

    print(df1)
    print_df_in_a_function()

def print_df_in_a_function():
    print(df1)

if __name__ == '__main__':
    main()
hridayns
  • 697
  • 8
  • 16
  • @Roy Oh and be sure to use only tabs for the indenting. – hridayns Jul 03 '17 at 04:58
  • 1
    **Spaces are the preferred indentation method**. It is mentioned in PEP 8. Take a look https://www.python.org/dev/peps/pep-0008/#tabs-or-spaces. So *only use tabs* is not the right thing to say. – void Jul 03 '17 at 05:01
  • @s_vishnu Wow, I definitely did not know that. Everytime I used spaces, I got so confused with the indentation. Thanks. – hridayns Jul 03 '17 at 05:08
  • I wanted to upvote your answer but *global directive does solve the problem*. It would only create a hell a lot of problems. Read about them. Avoid them as and when you can. – void Jul 03 '17 at 05:16
  • @s_vishnu It does? It worked for me. Okay, I will read about them soon. A link would be great. Thanks – hridayns Jul 03 '17 at 05:17
  • Look at my answer for an alternative. Instead of using *global* – void Jul 03 '17 at 05:17
  • Take a look at this.. http://wiki.c2.com/?GlobalVariablesAreBad – void Jul 03 '17 at 05:18
  • @s_vishnu I see. Thanks. – hridayns Jul 03 '17 at 05:18