Deleting row from hierarchical Series in Pandas based on column value and position

Question

I would like to remove the leading and trailing zeros from each event (level 1) but not the zeros surrounded by non-zero numbers.

The following works in finding and removing all zeros:

df = events[event_no][events[event_no] != 0]

I have the following hierarchical series:

   1    2/09/2010   0
        3/09/2010   1.5
        4/09/2010   4.3
        5/09/2010   5.1
        6/09/2010   0
   2    1/05/2007   53.2
        2/05/2007   0
        3/05/2007   21.5
        4/05/2007   2.5
        5/05/2007   0

and want:

   1    3/09/2010   1.5
        4/09/2010   4.3
        5/09/2010   5.1
   2    1/05/2007   53.2
        2/05/2007   0
        3/05/2007   21.5
        4/05/2007   2.5

I have read Deleting DataFrame row in Pandas based on column value and Filter columns of only zeros from a Pandas data frame but have been unsuccessful in solving this problem.

CT Zhu · Accepted Answer · 2014-02-11T04:06:10.107

How is your dataframe looks like. Anyway, shouldn't make any difference, simple Boolean indexing should do it:

In [101]:print df

Out [101]:
                   c1
first second         
1     2/09/2010   0.0
      3/09/2010   1.5
      4/09/2010   4.3
      5/09/2010   5.1
      6/09/2010   0.0
2     1/05/2007  53.2
      2/05/2007   0.0
      3/05/2007  21.5
      4/05/2007   2.5
      5/05/2007   0.0


In [102]:

is_edge=argwhere(hstack((0,diff([item[0] for item in df.index.tolist()])))!=0).flatten()
is_edge=hstack((is_edge, is_edge-1, 0, len(df)-1))
g_idx=hstack(([item for item in argwhere(df['c1']==0).flatten() if item not in is_edge], 
              argwhere(df['c1']!=0).flatten()))
print df.ix[sorted(g_idx)]



Out[102]:
                   c1
first second         
1     3/09/2010   1.5
      4/09/2010   4.3
      5/09/2010   5.1
2     1/05/2007  53.2
      2/05/2007   0.0
      3/05/2007  21.5
      4/05/2007   2.5

If you have a series instead of a dataframe, say the series is s, you can either:

Convert it to a dataframe:

df=pd.DataFrame(s, columns=['c1'])

Or:

In [113]:
is_edge=argwhere(hstack((0,diff([item[0] for item in s.index.tolist()])))!=0).flatten()
is_edge=hstack((is_edge, is_edge-1, 0, len(s)-1))
g_idx=hstack(([item for item in argwhere(s.values==0).flatten() if item not in is_edge], 
              argwhere(s.values!=0).flatten()))
s[sorted(g_idx)]
Out[113]:
first  second   
1      3/09/2010     1.5
       4/09/2010     4.3
       5/09/2010     5.1
2      1/05/2007    53.2
       2/05/2007     0.0
       3/05/2007    21.5
       4/05/2007     2.5
dtype: float64

BTW, I generate the series by:

In [116]:
tuples=[(1, '2/09/2010'),
(1, '3/09/2010'),
(1, '4/09/2010'),
(1, '5/09/2010'),
(1, '6/09/2010'),
(2, '1/05/2007'),
(2, '2/05/2007'),
(2, '3/05/2007'),
(2, '4/05/2007'),
(2, '5/05/2007')]
index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])
s = pd.Series(array([0.,1.5,4.3,5.1,0.,53.2,0.,21.5,2.5,0.]), index=index)
s
Out[116]:
first  second   
1      2/09/2010     0.0
       3/09/2010     1.5
       4/09/2010     4.3
       5/09/2010     5.1
       6/09/2010     0.0
2      1/05/2007    53.2
       2/05/2007     0.0
       3/05/2007    21.5
       4/05/2007     2.5
       5/05/2007     0.0
dtype: float64

Do I have the same structure right?

That doesn't match the OP's desired output. "I would like to remove the leading and trailing zeros from each event (level 1) but not the zeros surrounded by non-zero numbers." — DSM, Feb 10 '14 at 06:14
I missed that, don't know if there is a more elegant way of doing it. One has to find the edges of the first level no matter what. That requires a few lines at least. — CT Zhu, Feb 10 '14 at 09:07
Thanks, I apologize but it seems I have a hierarchical series and `df['c1']` will not work. I have tried replacing this with `df.values==0` but then it does not remove the zeros. I can change the series to a DF using `df = events.unstack(0)` but this produces a very large DF with few data points and many NaN's. Any suggestions? — mellover, Feb 10 '14 at 23:44
You have a `series` rather than a `Dataframe`? Ok, see edit. (Maybe you want to edit your title accordingly). — CT Zhu, Feb 11 '14 at 03:56
Thank you for the help, this worked great. Sorry for the confusion. — mellover, Feb 11 '14 at 22:24

Deleting row from hierarchical Series in Pandas based on column value and position

1 Answers1