1

I am trying to iterate over the rows in df and count consecutive rows when a certain value is NaN or 0 and start the count over if the value will change from NaN or 0. I would like to get something like this:

Value  Period
0      1
0      2
0      3
NaN    4
21     NaN
4      NaN
0      1
0      2
NaN    3

I wrote the function which takes a dataframe as an argument and returns it with an additional column which denotes the count:

def calc_period(df):
    period_x = []
    sum_x = 0
    for i in range(1,df.shape[0]):
        if df.iloc[i,0] == np.nan or df.iloc[i,0] == 0:
            sum_x += 1
            period_x.append(sum_x)
        else:
            period_x.append(None)
            sum_x = 0
    period_x.append(sum_x)
    df['period_x'] = period_x
    return df

The function works well when the value is 0. But when the value is NaN the count is also NaN and I get the following result:

Value  Period
0      1
0      2
0      3
NaN    NaN
NaN    NaN
MBT
  • 21,733
  • 19
  • 84
  • 102
Blazej Kowalski
  • 367
  • 1
  • 6
  • 16
  • Can't you [replace NaNs with 0s](https://stackoverflow.com/questions/13295735/how-can-i-replace-all-the-nan-values-with-zeros-in-a-column-of-a-pandas-datafra) using `fillna` – jjmontes Aug 27 '18 at 17:01

1 Answers1

4

Here is a revised version of your code:

import pandas as pd
import numpy as np
import math

def is_nan_or_zero(val):
    return math.isnan(val) or val == 0

def calc_period(df):
    is_first_nan_or_zero = is_nan_or_zero(df.iloc[0, 0])
    period_x = [1 if is_first_nan_or_zero else np.nan]
    sum_x = 1 if is_first_nan_or_zero else 0
    for i in range(1,df.shape[0]):
        val = df.iloc[i,0]
        if is_nan_or_zero(val):
            sum_x += 1
            period_x.append(sum_x)
        else:
            period_x.append(None)
            sum_x = 0
    df['period_x'] = period_x
    return df

There were 2 fixes:

  1. Replacing df.iloc[i,0] == np.nan with math.isnan(val)
  2. Remove period_x.append(sum_x) at the end, and add the first period value instead (since we start iterating from the second value)
zohar.kom
  • 1,765
  • 3
  • 12
  • 28