0

I have the following data:

import pandas as pd

data = {'var1': ['pero03930', 'pero03930', ' '],
        'var2': ['121324', '232434', ' '],
        'var3': [343, 937, 989],
        }

df = pd.DataFrame (data, columns = ['var1', 'var2', 'var3'])

print(df)

I'm trying to develop a function that identifies the missing values and this is what I have so far:

def missing_values(var1, var2, var3):
    if var1 is None:
        return 'Missing var1 in data'
    if var2 is None:
        return 'Missing var2 in data'
    if var3 is None:
        return 'missing var3 value in data'
    else:
        return 'No missing values in data'

print(missing_values(df))

I get this error:

TypeError: missing_values1() missing 2 required positional arguments: 'var2' and 'var3'

I know this is because the function is trying to find the two other parameters. How do I get the function to recognize that the parameters are within the data set? Or is there generally a better way to write this function?

Tomerikoo
  • 18,379
  • 16
  • 47
  • 61
Mrmoleje
  • 453
  • 1
  • 12
  • 35
  • Does this answer your question? [Find empty or NaN entry in Pandas Dataframe](https://stackoverflow.com/questions/27159189/find-empty-or-nan-entry-in-pandas-dataframe) – Tomerikoo May 26 '20 at 10:55

1 Answers1

1

I believe you should use inbuilt function to find None. And also " " != None

import pandas as pd

data = {'var1':  ['pero03930', 'pero03930', None],
        'var2': ['121324', '232434', ' '],
        'var3': [343, 937, 989],
        }

df = pd.DataFrame (data, columns = ['var1', 'var2', 'var3'])

print(df[df.isnull().any(axis=1)])

output

   var1 var2  var3
2  None  nan   989

Your Code

If you want, your code working then check following code. But your code has lots of problem.

  • When first None found, it will return. It never find other None in same row
import pandas as pd

data = {'var1':  ['pero03930', 'pero03930', None],
        'var2': ['121324', '232434', ' '],
        'var3': [343, 937, 989],
        }

df = pd.DataFrame (data, columns = ['var1', 'var2', 'var3'])


def missing_values(var1,var2,var3):
    if var1 is None:
        return 'Missing var1 in data'
    if var2 is None:
        return 'Missing var2 in data'
    if var3 is None:
        return 'missing var3 value in data'
    else:
        return 'No missing values in data'

for index, row in df.iterrows():
    print(missing_values(row["var1"], row["var2"], row["var3"]))

Output

No missing values in data
No missing values in data
Missing var1 in data

If this solution solves your problem then accept it else comment what's the problem.

PSKP
  • 1,178
  • 14
  • 28
  • Hi, thanks for this. Can you explain why the code has lots of problems? It doesn't matter that once a null or None is found that it stops searching, I just need to know if there are 1 or more nulls. Using the way you have used the function here does work – Mrmoleje May 26 '20 at 12:13
  • If you don't care about other None when one found then your code is OK. If your problem is solved the please accept it. – PSKP May 26 '20 at 12:53