0

I have gone through all posts on the website and am not able to find solution to my problem.

I have a dataframe with 15 columns. Some of them come with None or NaN values. I need help in writing the if-else condition.

If the column in the dataframe is not null and nan, I need to format the datetime column. Current Code is as below

for index, row in df_with_job_name.iterrows():
    start_time=df_with_job_name.loc[index,'startTime']
    if not df_with_job_name.isna(df_with_job_name.loc[index,'startTime']) :
        start_time_formatted =
            datetime(*map(int, re.split('[^\d]', start_time)[:-1]))

The error that I am getting is

if not df_with_job_name.isna(df_with_job_name.loc[index,'startTime']) :
TypeError: isna() takes exactly 1 argument (2 given)
Prune
  • 76,765
  • 14
  • 60
  • 81
  • Thanks I tried using null_map_df = df_with_job_name.isna(). It worked for a small number of test records. But while i was iterating over 700 items in this dataframe it returned false for one of the nan values – user10146633 Jul 29 '18 at 20:29

2 Answers2

0

isna takes your entire data frame as the instance argument (that's self, if you're already familiar with classes) and returns a data frame of Boolean values, True where that value is invalid. You tried to specify the individual value you're checking as a second input argument. isna doesn't work that way; it takes empty parentheses in the call.

You have a couple of options. One is to follow the individual checking tactics here. The other is to make the map of the entire data frame and use that:

null_map_df = df_with_job_name.isna()

for index, row in df_with_job_name.iterrows() :
    if not null_map_df.loc[index,row]) :
        start_time=df_with_job_name.loc[index,'startTime']
        start_time_formatted =
            datetime(*map(int, re.split('[^\d]', start_time)[:-1]))

Please check my use of row & column indices; the index, row handling doesn't look right. Also, you should be able to apply an any operation to the entire row at once.

Ben.T
  • 29,160
  • 6
  • 32
  • 54
Prune
  • 76,765
  • 14
  • 60
  • 81
0

A direct way to take care of missing/invalid values is probably:

def is_valid(val):
    if val is None:
       return False
    try:
       return not math.isnan(val)
    except TypeError:
       return True

and of course you'll have to import math.

Also it seems isna is not invoked with any argument and returns a dataframe of boolean values (see link). You can iterate thru both dataframes to determine if the value is valid.

Kevin He
  • 1,210
  • 8
  • 19