4

I have a data frame with multiple time series identified by uniquer id's. I would like to remove any time series that have only 0 values.

The data frame looks as follows,

id   date          value
AAA  2010/01/01    9
AAA  2010/01/02    10
AAA  2010/01/03    8
AAA  2010/01/04    4
AAA  2010/01/05    12
B    2010/01/01    0
B    2010/01/02    0
B    2010/01/03    0
B    2010/01/04    0
B    2010/01/05    0
CCC  2010/01/01    45
CCC  2010/01/02    46
CCC  2010/01/03    0
CCC  2010/01/04    0
CCC  2010/01/05    40

I want any time series with only 0 values to be removed so that the data frame look as follows,

id   date          value
AAA  2010/01/01    9
AAA  2010/01/02    10
AAA  2010/01/03    8
AAA  2010/01/04    4
AAA  2010/01/05    12
CCC  2010/01/01    45
CCC  2010/01/02    46
CCC  2010/01/03    0
CCC  2010/01/04    0
CCC  2010/01/05    40

This is a follow up to a previous question that was answered with a really great solution using the data.tables package.

R efficiently removing missing values from the start and end of multiple time series in 1 data frame

Community
  • 1
  • 1
sizeight
  • 719
  • 1
  • 8
  • 19

3 Answers3

6

If dat is a data.table, then this is easy to write and read :

dat[,.SD[any(value!=0)],by=id]

.SD stands for Subset of Data. This answer explains .SD very well.

Picking up on Gabor's nice use of ave, but without repeating the same variable name (DF) three times, which can be a source of typo bugs if you have a lot of long or similar variable names, try :

dat[ ave(value!=0,id,FUN=any) ]

The difference in speed between those two may be dependent on several factors including: i) number of groups ii) size of each group and iii) the number of columns in the real dat.

Community
  • 1
  • 1
Matt Dowle
  • 58,872
  • 22
  • 166
  • 224
  • Thanks for the solution. Your data.table package is really useful. Going to make data wrangling a lot easier for myself. – sizeight May 31 '12 at 05:11
2

Try this. No packages are used.

DF[ ave(DF$value != 0, DF$id, FUN = any), ]
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
1

An easy plyr solution would be

ddply(mydat,"id",function(x) if (all(x$value==0)) NULL else x)

(seems to work OK) but there may be a faster solution with data.table ...

Ben Bolker
  • 211,554
  • 25
  • 370
  • 453