Delete rows in subsequences that contain leading zeros in a dataframe

Question

I have a data frame in following format with a time series

A  B  C  201401 201402 201403

a1 b1 c1  100    200    300
a2 b2 c2  0      250     0

I have used Pandas.melt to flatten this data and have got following format.

A  B  C  YYYYMM Value
a1 b1 c1 201401 100
a1 b1 c1 201402 200
a1 b1 c1 201403 300
a2 b2 c2 201401 0
a2 b2 c2 201402 250
a2 b2 c2 201403 0

Now for a particular combination of [A B C] I only want the time series starting from non zero values.so my output should be like this.

A  B  C  YYYYMM Value
a1 b1 c1 201401 100
a1 b1 c1 201402 200
a1 b1 c1 201403 300
a2 b2 c2 201402 250
a2 b2 c2 201403 0

I tried,

df.groupby(['A','B','C']).apply(lambda x: x['Value'][np.where(x['Value']>0)[0][0]:]

This just gives me time series and doesn't imply inplace changes. What should I do to achieve this?

why don't you just filter the dataframe `df = df[df['Value'] > 0]`? — Mohamed Ali JAMAOUI, Aug 22 '17 at 08:44
Hi This will eliminate all the zero values in time series. I just want to eliminate leading zeroes. I have changed the example for reference. — Hima, Aug 22 '17 at 08:49
I see no leading zeros in your examples, a leading zero in a number should be like this 0100 — Mohamed Ali JAMAOUI, Aug 22 '17 at 08:53
if you see A,B,C as a group and having a time series with values 0,250,0 according to me 250 , 0 are leading zero eliminated time series. — Hima, Aug 22 '17 at 08:55

score 0 · Answer 1 · answered Aug 22 '17 at 16:26

I continued with your idea of grouping and then filtering. The basic idea was to take each group and find the first non-zero Value's index assuming they are already sorted by date. And then just ungroup and clean up.

def applyFunc(row):
    row_values = np.array(row.Value)
    first_non_zero_index = next((i for i, x in enumerate(row_values) if x), None)
    return row.iloc[first_non_zero_index:]

df.groupby(['A','B','C']).apply(applyFunc).drop(["A","B","C"],axis=1).reset_index().drop("level_3",axis=1)

Uses a snippet from https://stackoverflow.com/a/19502403/2750819

Delete rows in subsequences that contain leading zeros in a dataframe

1 Answers1