0

I would like to remove the leading and trailing zeros from each event (level 1) but not the zeros surrounded by non-zero numbers.

The following works in finding and removing all zeros:

df = events[event_no][events[event_no] != 0]

I have the following hierarchical series:

   1    2/09/2010   0
        3/09/2010   1.5
        4/09/2010   4.3
        5/09/2010   5.1
        6/09/2010   0
   2    1/05/2007   53.2
        2/05/2007   0
        3/05/2007   21.5
        4/05/2007   2.5
        5/05/2007   0

and want:

   1    3/09/2010   1.5
        4/09/2010   4.3
        5/09/2010   5.1
   2    1/05/2007   53.2
        2/05/2007   0
        3/05/2007   21.5
        4/05/2007   2.5

I have read Deleting DataFrame row in Pandas based on column value and Filter columns of only zeros from a Pandas data frame but have been unsuccessful in solving this problem.

Community
  • 1
  • 1
mellover
  • 163
  • 2
  • 2
  • 6

1 Answers1

0

How is your dataframe looks like. Anyway, shouldn't make any difference, simple Boolean indexing should do it:

In [101]:print df

Out [101]:
                   c1
first second         
1     2/09/2010   0.0
      3/09/2010   1.5
      4/09/2010   4.3
      5/09/2010   5.1
      6/09/2010   0.0
2     1/05/2007  53.2
      2/05/2007   0.0
      3/05/2007  21.5
      4/05/2007   2.5
      5/05/2007   0.0


In [102]:

is_edge=argwhere(hstack((0,diff([item[0] for item in df.index.tolist()])))!=0).flatten()
is_edge=hstack((is_edge, is_edge-1, 0, len(df)-1))
g_idx=hstack(([item for item in argwhere(df['c1']==0).flatten() if item not in is_edge], 
              argwhere(df['c1']!=0).flatten()))
print df.ix[sorted(g_idx)]



Out[102]:
                   c1
first second         
1     3/09/2010   1.5
      4/09/2010   4.3
      5/09/2010   5.1
2     1/05/2007  53.2
      2/05/2007   0.0
      3/05/2007  21.5
      4/05/2007   2.5

If you have a series instead of a dataframe, say the series is s, you can either:

Convert it to a dataframe:

df=pd.DataFrame(s, columns=['c1'])

Or:

In [113]:
is_edge=argwhere(hstack((0,diff([item[0] for item in s.index.tolist()])))!=0).flatten()
is_edge=hstack((is_edge, is_edge-1, 0, len(s)-1))
g_idx=hstack(([item for item in argwhere(s.values==0).flatten() if item not in is_edge], 
              argwhere(s.values!=0).flatten()))
s[sorted(g_idx)]
Out[113]:
first  second   
1      3/09/2010     1.5
       4/09/2010     4.3
       5/09/2010     5.1
2      1/05/2007    53.2
       2/05/2007     0.0
       3/05/2007    21.5
       4/05/2007     2.5
dtype: float64

BTW, I generate the series by:

In [116]:
tuples=[(1, '2/09/2010'),
(1, '3/09/2010'),
(1, '4/09/2010'),
(1, '5/09/2010'),
(1, '6/09/2010'),
(2, '1/05/2007'),
(2, '2/05/2007'),
(2, '3/05/2007'),
(2, '4/05/2007'),
(2, '5/05/2007')]
index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])
s = pd.Series(array([0.,1.5,4.3,5.1,0.,53.2,0.,21.5,2.5,0.]), index=index)
s
Out[116]:
first  second   
1      2/09/2010     0.0
       3/09/2010     1.5
       4/09/2010     4.3
       5/09/2010     5.1
       6/09/2010     0.0
2      1/05/2007    53.2
       2/05/2007     0.0
       3/05/2007    21.5
       4/05/2007     2.5
       5/05/2007     0.0
dtype: float64

Do I have the same structure right?

CT Zhu
  • 52,648
  • 17
  • 120
  • 133
  • That doesn't match the OP's desired output. "I would like to remove the leading and trailing zeros from each event (level 1) but not the zeros surrounded by non-zero numbers." – DSM Feb 10 '14 at 06:14
  • I missed that, don't know if there is a more elegant way of doing it. One has to find the edges of the first level no matter what. That requires a few lines at least. – CT Zhu Feb 10 '14 at 09:07
  • This solution requires the 1st level index to be numerical. – CT Zhu Feb 10 '14 at 09:26
  • Thanks, I apologize but it seems I have a hierarchical series and `df['c1']` will not work. I have tried replacing this with `df.values==0` but then it does not remove the zeros. I can change the series to a DF using `df = events.unstack(0)` but this produces a very large DF with few data points and many NaN's. Any suggestions? – mellover Feb 10 '14 at 23:44
  • You have a `series` rather than a `Dataframe`? Ok, see edit. (Maybe you want to edit your title accordingly). – CT Zhu Feb 11 '14 at 03:56
  • Thank you for the help, this worked great. Sorry for the confusion. – mellover Feb 11 '14 at 22:24