5
  • I like to compute 'Percentage of change between two rows' while Item changes in rows.
  • For first row of items, I am ok to place some text or blank (example ‘skip’ or blank is ok) because of no comparison of previous row for same item

    df <- data.frame(period=rep(1:4,2), 
                 farm=c(rep('A',4),rep('B',4)), 
                 cumVol=c(1,5,15,31,10,12,16,24), item=c(rep('Z',3),rep('Y',3),rep('X',2)),
                 other = 1:8);
    

My desired out put of 'Percentage_Change' column (additional column to dataset) would be like below. (Formula: 100/Previous Item row value * current row Item value...Example for 2nd row..100/1 * 5 = 500)

Row Percentage_Change
1   Skip
2   500
3   300
4   Skip
5   32.25806452
6   120
7   Skip
8   150

Thanks for all your help.

Murali
  • 579
  • 1
  • 6
  • 20

2 Answers2

5

The same can be done with dplyr. This is completely analogous to akrun's answer, but uses other syntax.

library(dplyr)
mutate(df, Row = 1:n()) %>%
  group_by(item) %>%
  mutate(Percentage_Change = cumVol/lag(cumVol) * 100) %>%
  ungroup %>%
  select(Row, Percentage_Change)
## Source: local data frame [8 x 2]
## 
##     Row Percentage_Change
##   (int)             (dbl)
## 1     1                NA
## 2     2         500.00000
## 3     3         300.00000
## 4     4                NA
## 5     5          32.25806
## 6     6         120.00000
## 7     7                NA
## 8     8         150.00000
Community
  • 1
  • 1
Stibu
  • 15,166
  • 6
  • 57
  • 71
2

We can use shift from data.table. Convert the 'data.frame' to 'data.table' (setDt(df)), grouped by 'item', we get the "Row" from .I and create the "Percentage_Change" by dividing the "cumVol" by the lag of "cumVol" (got by shift) and multiplying by 100. If needed the grouping column can be removed by assigning (:=) it to NULL.

library(data.table)
setDT(df)[, list(Row = .I, Percentage_Change=round(cumVol*
   (100/shift(cumVol)),2)), .(item)][, item := list(NULL)][]
#   Row Percentage_Change
#1:   1               NA
#2:   2           500.00
#3:   3           300.00
#4:   4               NA
#5:   5            32.26
#6:   6           120.00
#7:   7               NA
#8:   8           150.00

NOTE: This gives NA for elements where there are no comparison, instead of the Skip.

akrun
  • 874,273
  • 37
  • 540
  • 662
  • @Akrun..can you help me in 5th value please? I did overlooked that value. – Murali Mar 20 '16 at 07:04
  • 1
    @Murali It is changed. Please check – akrun Mar 20 '16 at 07:04
  • 1
    I just tested the code on my big dataset and worded perfect for less than 100 values. Great job and I love you :-) – Murali Mar 20 '16 at 07:11
  • How could you apply this to categories -1 and 1 for `% change <= .8` and `% change >= 1.2` and also to look back at values for the past hour instead of just the last row – zsad512 Sep 07 '17 at 00:55
  • @zsad512 Please post as a new question as your comment is not clear to me – akrun Sep 07 '17 at 04:52
  • @akrun https://stackoverflow.com/questions/46084728/breaking-a-continuous-variable-into-categories-using-dplyr-and-or-cut/46086107?noredirect=1#comment79135217_46086107 – zsad512 Sep 07 '17 at 13:20