Pandas: df (dataframe) is not defined

Question

I'm trying to load and edit a dataframe from a xlsx file. The file is located in the path which I defined in the variable einlesen. As soon as the bug is fixed, I want to delete a row and save the new dataframe in a new xlsx file in a specific path.

import os
import re
import pandas as pd
import glob
import time

def setwd():
    from pathlib import Path
    import os

    home = str(Path.home())
    
    os.chdir(home + r'\...\...\Staffing Report\Input\...\Raw_Data')
    
    latest = home + r'\...\...\Staffing Report\Input\MyScheduling\Raw_Data'
    
    folders = next(os.walk(latest))[1]
    creation_times = [(folder, os.path.getctime(folder)) for folder in folders]
    creation_times.sort(key=lambda x: x[1])
    
    most_recent = creation_times[-1][0]
    print('test' + most_recent)
    
    os.chdir(latest + '\\' + most_recent + '\\')
    
    print('current cwd is: ' + os.getcwd())
    
    save_dir = home + '\...\...\Staffing Report\Input\MyScheduling\Individual Status All\PBI\\' + 'Individual_Status.xlsx'
    

def rowdrop():
    
    einlesen = os.getcwd()
    print('test einlesen: ' + einlesen)
    
    df = pd.DataFrame()
    df = pd.read_excel('Individual Status.xls', sheet_name = 'Individual Status Raw Data')
    df = pd.DataFrame(df)

#main

setwd()
rowdrop()

df.to_excel(save_dir, index = False)

print(df)

If im trying to run the code, it always states:

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-92-060708f6b065> in <module>
      2 rowdrop()
      3 
----> 4 df.to_excel(save_dir, index = False)
      5 
      6 print(df)

NameError: name 'df' is not defined

score -1 · Accepted Answer · answered Feb 22 '21 at 14:11

You get the error because you only defined df inside the rowdrop function; variables defined inside function can only be accessed inside the functions unless you do something to change that.

Change your function to return the df:

def rowdrop():
    
    einlesen = os.getcwd()
    print('test einlesen: ' + einlesen)
    
    df = pd.DataFrame()
    df = pd.read_excel('Individual Status.xls', sheet_name = 'Individual Status Raw Data')
    df = pd.DataFrame(df)
    return df

And assign the returned value of the function call to a variable:

df = rowdrop()

Another way that is considered bad practice is to use the global method to make the df variable global:

def rowdrop():
    global df
    einlesen = os.getcwd()
    print('test einlesen: ' + einlesen)

    df = pd.DataFrame()
    df = pd.read_excel('Individual Status.xls', sheet_name = 'Individual Status Raw Data')
    df = pd.DataFrame(df)

With the above method, you won't need to assign the function call to a variable, but please do not use that method, see Why are global variables evil?

score -1 · Answer 2 · answered Feb 22 '21 at 14:11

You should return the dataframe in your function rowdrop. I would like to point out that the name of your function might not be the most relevant since it also creates and returns a dataframe.

def rowdrop():
    
    einlesen = os.getcwd()
    print('test einlesen: ' + einlesen)
    
    df = pd.DataFrame()
    df = pd.read_excel('Individual Status.xls', sheet_name = 'Individual Status Raw Data')
    df = pd.DataFrame(df)
    return df

#main

setwd()
df = rowdrop()

score -1 · Answer 3 · answered Feb 22 '21 at 14:16

That NameError you're getting happens because you're referring to the variable df from outside the function rowdrop(). You should be calling this df.to_excel(save_dir, index = False) inside that function.

I suggest you search for "variable scopes in Python" on Google to look for more info.

Also, you're doing unnecesary steps there. It would be enough to use the df = pd.read_excel(...) function to load the Excel file into a pandas DataFrame.

def rowdrop():
    
    einlesen = os.getcwd()
    print('test einlesen: ' + einlesen)
    
    df = pd.read_excel('Individual Status.xls', sheet_name = 'Individual Status Raw Data')

You could then use the df.drop() function to remove the row you want, and then save it with df.to_excel

See more: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.drop.html

Pandas: df (dataframe) is not defined

3 Answers3