1

I work with a data frame that that looks like that:

enter image description here

    DATUM               CP                SMER  TRH   MNOZSTVI  CENA POPLATKY   OBJEM UCET  KVARTAL   ROK AKTUALNI.MNOZSTVI
   <dttm>              <chr>             <chr> <chr>    <dbl> <dbl>    <dbl>   <dbl> <chr> <chr>   <dbl>             <dbl>
 1 2020-03-03 00:00:00 CEZ               K     BCPP        50 465.      91.3 -23240  CZK   Q1       2020                NA
 2 2020-03-04 00:00:00 CEZ               K     BCPP        50 467.      58.9 -13980  CZK   Q1       2020                NA
 3 2020-03-12 00:00:00 CEZ               P     BCPP        30 398       51.8  11940  CZK   Q1       2020                NA
 4 2020-03-25 00:00:00 KOMERCNI BANKA    K     BCPP        40 542       85.9 -21680  CZK   Q1       2020                NA
 5 2020-03-25 00:00:00 MONETA MONEY BANK K     BCPP       300  58.4     71.3 -17505  CZK   Q1       2020                NA
 6 2020-03-30 00:00:00 CEZ               K     BCPP        10 391       50    -3910  CZK   Q1       2020                NA
 7 2020-04-02 00:00:00 USD               K     NA        1000  25.8      0   -25778  CZK   Q2       2020                NA
 8 2020-04-03 00:00:00 USD               K     NA        3000  26.1      0   -78392  CZK   Q2       2020                NA
 9 2020-04-04 00:00:00 USD               K     NA        1000  26.4      0   -26363. CZK   Q2       2020                NA
10 2020-04-06 00:00:00 AVAST             K     BCPP       150 125.      75.8 -18810  CZK   Q2       2020                NA

And I would like to fill cumulative sum of variable MNOZSTVI into variable AKTUALNI.MNOZSTVI grouped by CP. So vector AKTUALNI.MNOZSTVI should be c(50,100,130,40,300,140,1000,4000,5000,150, etc.).

The problem is that some values of MNOZSTVI are missing and so I dont know how to use function cumsun() that can't cope with missing values + I struggle to perform it for grouped data.

Does anybody know how to do that either woth the help of cumsum() or some other function? Thank you.

Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
JEŠTĚR
  • 53
  • 4

2 Answers2

1

We can group by 'CP' and get the cumsum of 'MNOZSTVI' in mutate

library(dplyr)
df1 <- df1 %>%
     group_by(CP) %>%
     mutate(AKTUALNI.MNOZSTVI  = cumsum(MNOZSTVI))

Or use base R with ave

df1$AKTUALNI.MNOZSTVI <- with(df1, ave(MNOZSTVI, CP, FUN = cumsum))
akrun
  • 874,273
  • 37
  • 540
  • 662
1
library(dplyr)
df %>%
  group_by(CP) %>%
  mutate(AKTUALNI.MNOZSTVI  = cumsum(MNOZSTVI))

Output:

   DATUM      CP                         SMER  TRH   MNOZSTVI  CENA POPLATKY OBJEM UCET  KVARTAL ROK.AKTUALNI..MNOZSTVI AKTUALNI.MNOZSTVI
   <chr>      <chr>                      <chr> <chr>    <int> <dbl> <chr>    <chr> <chr> <chr>   <chr>                              <int>
 1 2020-03-03 00:00:00 CEZ               K     BCPP        50 465   91.3     NA    CZK   Q1      2020 NA                               50
 2 2020-03-04 00:00:00 CEZ               K     BCPP        50 467   58.9     NA    CZK   Q1      2020 NA                              100
 3 2020-03-12 00:00:00 CEZ               P     BCPP        30 398   51.8     11940 CZK   Q1      2020                                 130
 4 2020-03-25 00:00:00 KOMERCNI BANKA    K     BCPP        40 542   85.9     -     CZK   Q1      2020                                  40
 5 2020-03-25 00:00:00 MONETA MONEY BANK K     BCPP       300  58.4 71.3     -     CZK   Q1      2020                                 300
 6 2020-03-30 00:00:00 CEZ               K     BCPP        10 391   50       -     CZK   Q1      2020                                 140
 7 2020-04-02 00:00:00 USD               K     NA        1000  25.8 0        -     CZK   Q2      2020                                1000
 8 2020-04-03 00:00:00 USD               K     NA        3000  26.1 0        -     CZK   Q2      2020                                4000
 9 2020-04-04 00:00:00 USD               K     NA        1000  26.4 0        -     CZK   Q2      2020                                5000
10 2020-04-06 00:00:00 AVAST             K     BCPP       150 125   75. 8    -     CZK   Q2      2020                                 150

data:

df <- tibble::tribble(
        ~DATUM,                          ~CP, ~SMER,   ~TRH, ~MNOZSTVI, ~CENA, ~POPLATKY,  ~OBJEM, ~UCET, ~KVARTAL, ~ROK.AKTUALNI..MNOZSTVI,
  "2020-03-03",               "00:00:00 CEZ",   "K", "BCPP",       50L,   465,    "91.3",      NA, "CZK",     "Q1",               "2020 NA",
  "2020-03-04",               "00:00:00 CEZ",   "K", "BCPP",       50L,   467,    "58.9",      NA, "CZK",     "Q1",               "2020 NA",
  "2020-03-12",               "00:00:00 CEZ",   "P", "BCPP",       30L,   398,    "51.8", "11940", "CZK",     "Q1",                  "2020",
  "2020-03-25",    "00:00:00 KOMERCNI BANKA",   "K", "BCPP",       40L,   542,    "85.9",     "-", "CZK",     "Q1",                  "2020",
  "2020-03-25", "00:00:00 MONETA MONEY BANK",   "K", "BCPP",      300L,  58.4,    "71.3",     "-", "CZK",     "Q1",                  "2020",
  "2020-03-30",               "00:00:00 CEZ",   "K", "BCPP",       10L,   391,      "50",     "-", "CZK",     "Q1",                  "2020",
  "2020-04-02",               "00:00:00 USD",   "K",     NA,     1000L,  25.8,       "0",     "-", "CZK",     "Q2",                  "2020",
  "2020-04-03",               "00:00:00 USD",   "K",     NA,     3000L,  26.1,       "0",     "-", "CZK",     "Q2",                  "2020",
  "2020-04-04",               "00:00:00 USD",   "K",     NA,     1000L,  26.4,       "0",     "-", "CZK",     "Q2",                  "2020",
  "2020-04-06",             "00:00:00 AVAST",   "K", "BCPP",      150L,   125,   "75. 8",     "-", "CZK",     "Q2",                  "2020"
  )
TarJae
  • 72,363
  • 6
  • 19
  • 66