0

I want to exclude all rows with incomplete weeks, i.e the rows containing less than seven subsequent same numbers in the column "ww" describing the week number. It must be conditional on this ww column.

In the sample data below you see that rows containing weeks 29 and 32 (in the ww column) should be excluded as they do not contain 7 subsequent numbers.

Date     Close MarketCap CoinName datenum yyyy mm ww dd yyyymmdd
 1: 2018-07-19 0.4833250  19332999   0chain   17731 2018  7 29 19 20180719
 2: 2018-07-20 0.4328458  17313830   0chain   17732 2018  7 29 20 20180720
 3: 2018-07-21 0.3919436  15677744   0chain   17733 2018  7 29 21 20180721
 4: 2018-07-22 0.3772339  15089355   0chain   17734 2018  7 30 22 20180722
 5: 2018-07-23 0.3607929  14431715   0chain   17735 2018  7 30 23 20180723
 6: 2018-07-24 0.3531285  14125139   0chain   17736 2018  7 30 24 20180724
 7: 2018-07-25 0.3614665  14458661   0chain   17737 2018  7 30 25 20180725
 8: 2018-07-26 0.3782509  15130036   0chain   17738 2018  7 30 26 20180726
 9: 2018-07-27 0.3500712  14002849   0chain   17739 2018  7 30 27 20180727
10: 2018-07-28 0.3510113  14040452   0chain   17740 2018  7 30 28 20180728
11: 2018-07-29 0.3859281  15437126   0chain   17741 2018  7 31 29 20180729
12: 2018-07-30 0.3696146  14784582   0chain   17742 2018  7 31 30 20180730
13: 2018-07-31 0.3418870  13675481   0chain   17743 2018  7 31 31 20180731
14: 2018-08-01 0.3230662  12922649   0chain   17744 2018  8 31  1 20180801
15: 2018-08-02 0.2872402  11489610   0chain   17745 2018  8 31  2 20180802
16: 2018-08-03 0.2476886   9907543   0chain   17746 2018  8 31  3 20180803
17: 2018-08-04 0.2474120   9896481   0chain   17747 2018  8 31  4 20180804
18: 2018-08-05 0.2342555   9370222   0chain   17748 2018  8 32  5 20180805
19: 2018-08-06 0.3182011  12728042   0chain   17749 2018  8 32  6 20180806
20: 2018-08-07 0.2939107  11756427   0chain   17750 2018  8 32  7 20180807
jpsmith
  • 11,023
  • 5
  • 15
  • 36
  • 4
    Hi @Erik Stålman , please read [How to ask a good question](https://stackoverflow.com/help/how-to-ask) for you to better get help. – Jilber Urbina Oct 24 '22 at 22:25
  • 4
    Following what @JilberUrbina sugested, please post your data so we can run code with. Try pasting the output from `dput(AllcoinsSTATA3)` (and the same for the other two dataframes). If the dataframes are too large, just give us the first rows of each. – Ricardo Semião e Castro Oct 24 '22 at 22:27
  • 1
    Hi Erik, you are probably getting a lot of negative responses on this because [R-tagged posts](https://stackoverflow.com/questions/tagged/r) expressly request that you do not post data as a screenshot, and when asking a question there are instructions on how to ask a question that is reproducible (or [here](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example), or what Jilber directed you to). – jpsmith Oct 25 '22 at 01:10
  • 1
    Thank you I have now added a reproducible data set. Sorry my first time posting. @jpsmith – Erik Stålman Oct 25 '22 at 11:31
  • Thank you I have now added a reproducible data set. Sorry my first time posting @JilberUrbina – Erik Stålman Oct 25 '22 at 11:34
  • Thank you I have now added a reproducible data set. Sorry my first time posting. @Ricardo Semião e Castro – Erik Stålman Oct 25 '22 at 11:35

1 Answers1

0

There is probably a more elegant way to do this, but one approach would be to count the number of sequential weeks (ww) using sequence and rle then use dplyr::group_by and dplyr::filter to remove any weeks < 7. The any function within filter will search the group to see if the week has at least 7.

# Assigned data provided in question as "df"

df2 <- df %>% 
  mutate(counts = sequence(rle(as.character(ww))$lengths)) %>%
  group_by(ww) %>%
  filter(any(counts == 7)) %>%
  select(-counts)

# or save a few lines of code:

df3 <- df %>% 
  group_by(ww) %>%
  filter(any(sequence(rle(as.character(ww))$lengths) == 7))

Output:

#   Date       Close MarketCap CoinName datenum  yyyy    mm    ww    dd yyyymmdd
# <chr>      <dbl>     <int> <chr>      <int> <int> <int> <int> <int>    <int>
# 1 2018-07-22 0.377  15089355 0chain     17734  2018     7    30    22 20180722
# 2 2018-07-23 0.361  14431715 0chain     17735  2018     7    30    23 20180723
# 3 2018-07-24 0.353  14125139 0chain     17736  2018     7    30    24 20180724
# 4 2018-07-25 0.361  14458661 0chain     17737  2018     7    30    25 20180725
# 5 2018-07-26 0.378  15130036 0chain     17738  2018     7    30    26 20180726
# 6 2018-07-27 0.350  14002849 0chain     17739  2018     7    30    27 20180727
# 7 2018-07-28 0.351  14040452 0chain     17740  2018     7    30    28 20180728
# 8 2018-07-29 0.386  15437126 0chain     17741  2018     7    31    29 20180729
# 9 2018-07-30 0.370  14784582 0chain     17742  2018     7    31    30 20180730
# 10 2018-07-31 0.342  13675481 0chain     17743  2018     7    31    31 20180731
# 11 2018-08-01 0.323  12922649 0chain     17744  2018     8    31     1 20180801
# 12 2018-08-02 0.287  11489610 0chain     17745  2018     8    31     2 20180802
# 13 2018-08-03 0.248   9907543 0chain     17746  2018     8    31     3 20180803
# 14 2018-08-04 0.247   9896481 0chain     17747  2018     8    31     4 20180804
jpsmith
  • 11,023
  • 5
  • 15
  • 36