4

I have a function that creates a DataFrame. Within the function i can have it printed. But I am doing something wrong in the return process, because I can't seem to call the DataFrame after running the function. Below is my dummy code and the attached error.

import pandas as pd
def testfunction(new_df_to_output):
    new_df_to_output = pd.DataFrame()
    S1 = pd.Series([33,66], index=['a', 'b'])
    S2 = pd.Series([22,44], index=['a', 'b'])
    S3 = pd.Series([11,55], index=['a', 'b'])

    new_df_to_output = new_df_to_output.append([S1, S2, S3], ignore_index=True)
    print new_df_to_output
    print type(new_df_to_output)
    print dir()
    return new_df_to_output

testfunction('Desired_DF_name')

print dir()
print Desired_DF_name

The DataFrame prints properly within the function. The directory shows that the DataFrame is not returned after the function. Trying to print that dataframe returns returns the following error

Traceback (most recent call last): File "functiontest.py", line 21, in print Desired_DF_name NameError: name 'Desired_DF_name' is not defined

I am sure it is a simple mistake but I can't find the solution after searching Stackoverflow and python tutorials. Any guidance is greatly appreciated.

Bobby M
  • 159
  • 1
  • 2
  • 9

2 Answers2

5

Inside testfunction, the variable new_df_to_output is essentially a label that you are assigning to the passed in object.

testfunction('Desired_DF_name') doesn't do what you think; it is assigning the value of the string 'Desired_DF_name' to the variable new_df_to_output; it is not creating a new variable named Desired_DF_name. Basically it's the same as writing new_df_to_output = 'Desired_DF_name'.

You want to save the DataFrame that is returned from the function into a variable. So instead of

testfunction('Desired_DF_name')

you want

def testfunction():
    ...
Desired_DF_name = testfunction()

(You can change the definition of testfunction to remove the new_df_to_output parameter. The function wasn't doing anything with it anyway because you immediately reassign the variable: new_df_to_output = pd.DataFrame().)

0x5453
  • 12,753
  • 1
  • 32
  • 61
  • thank you. none of the documentation or examples that i'd seen made my mistake clear the way that you did. – Bobby M May 15 '17 at 18:03
2

I think you really want something like this:

import pandas as pd

def testfunction():
    result = pd.DataFrame()
    S1 = pd.Series([33,66], index=['a', 'b'])
    S2 = pd.Series([22,44], index=['a', 'b'])
    S3 = pd.Series([11,55], index=['a', 'b'])
    result.append([S1, S2, S3], ignore_index=True)
    return result

Desired_DF_name = testfunction()

You should carefully read Defining Functions and More on Defining Functions in the documentation.

  • thank you for your time. your answer was helpful. I want to say, though, that i'd read those 2 links and the stackoverflow suggested question before postaind, and did not understand what you and 0x5453 explained. Maybe its in there and i was being dumb. Thanks again for your time. – Bobby M May 15 '17 at 18:02
  • The documentation goes into detail about the difference between a function that does not return a value and one that does: "Coming from other languages, you might object that fib is not a function but a procedure since it doesn’t return a value." I think you just need to read the documentation more closely and type in the examples. –  May 15 '17 at 19:00