0

I have a pandas dataframe with columns that sometimes have nan values. I know that I could remove these en masse using pandas own dropna function but I wanted to write my own function in this case that I can then individually call on each column so I wrote:

data = pd.read_csv('data.csv')
def remove_nans_from_column(column_name):
    data = data[~data[column_name].isna()]
remove_nans_from_column('bmi')

But running this produces this error:

---------------------------------------------------------------------------
UnboundLocalError                         Traceback (most recent call last)
<ipython-input-28-3be0ddcb71ff> in <module>()
----> 1 remove_nans_from_column('bmi')

<ipython-input-26-911d33fb618e> in remove_nans_from_column(column_name)
      1 def remove_nans_from_column(column_name):
----> 2     data = data[~data[column_name].isna()]

UnboundLocalError: local variable 'data' referenced before assignment

I understand that the variable data is not defined inside the function but it should be able to get it from the rest of the code.

data = data[~data['bmi'].isna()]

works when it is not inside a function so why does it not work in a function?

I am more curious about the reason for this error here rather than how to fix it which I can already do.

Semihcan Doken
  • 776
  • 3
  • 10
  • 23
  • https://stackoverflow.com/questions/22439752/python-local-vs-global-variables – Chris Jan 04 '19 at 22:35
  • What about using the built-in `dropna()` method instead? – G. Anderson Jan 04 '19 at 22:41
  • @Chris thanks for the link. I already know what variable scopes are when I posted the question. if i understand correctly from the link you sent, this answer is that python gets confused when you modify a variable from the larger scope within a function and this should be avoided? – Semihcan Doken Jan 04 '19 at 22:45
  • [9.2. Python Scopes and Namespaces](https://docs.python.org/3/tutorial/classes.html#python-scopes-and-namespaces) – wwii Jan 05 '19 at 04:20

0 Answers0