1

I have a data frame in following format with a time series

A  B  C  201401 201402 201403

a1 b1 c1  100    200    300
a2 b2 c2  0      250     0

I have used Pandas.melt to flatten this data and have got following format.

A  B  C  YYYYMM Value
a1 b1 c1 201401 100
a1 b1 c1 201402 200
a1 b1 c1 201403 300
a2 b2 c2 201401 0
a2 b2 c2 201402 250
a2 b2 c2 201403 0

Now for a particular combination of [A B C] I only want the time series starting from non zero values.so my output should be like this.

A  B  C  YYYYMM Value
a1 b1 c1 201401 100
a1 b1 c1 201402 200
a1 b1 c1 201403 300
a2 b2 c2 201402 250
a2 b2 c2 201403 0

I tried,

df.groupby(['A','B','C']).apply(lambda x: x['Value'][np.where(x['Value']>0)[0][0]:]

This just gives me time series and doesn't imply inplace changes. What should I do to achieve this?

vestland
  • 55,229
  • 37
  • 187
  • 305
Hima
  • 11,268
  • 3
  • 22
  • 31

1 Answers1

0

I continued with your idea of grouping and then filtering. The basic idea was to take each group and find the first non-zero Value's index assuming they are already sorted by date. And then just ungroup and clean up.

def applyFunc(row):
    row_values = np.array(row.Value)
    first_non_zero_index = next((i for i, x in enumerate(row_values) if x), None)
    return row.iloc[first_non_zero_index:]

df.groupby(['A','B','C']).apply(applyFunc).drop(["A","B","C"],axis=1).reset_index().drop("level_3",axis=1)

Uses a snippet from https://stackoverflow.com/a/19502403/2750819

Kent Shikama
  • 3,910
  • 3
  • 22
  • 55