7

I have below mentioned dataframe:

Month         Val1          Val2      Part1      Part2
Feb-17        250           10000     25         100
A             25            2000      5          20
B             25            2000      5          20
C             100           2000      5          20
D             10            2000      5          20
E             40            2000      5          20
Jan-17        200           8000     50         100
A             20            1000      10         20
B             40            1600      10         20
C             100           1600      10         20
D             20            1600      10         20
E             20            2200      10         20

I want to add two columns Per_1 after Val1 and Per_2 after Val2. Which compare Feb-17 Data with Jan-17 Data and give growth percentage (increase or decrease).

Note: For A,B,C,D and E the sequence would be differ in Feb-17 and Jan-17, here i want to calculate growth of Feb-17 A with Jan-17 A and respectively.

In case of Jan-17, since there are no Data in last month to compare with in dataframe, the Per_1 and Per_2 should be 0.00% in that case.

Required Output:

Month   Val1    Per_1   Val2    Per_2   Part1   Part2
Feb-17  250     25.00%  10000   25.00%  25  100
A       25      25.00%  2000    100.00% 5   20
B       25     -37.50%  2000    25.00%  5   20
C       100     0.00%   2000    25.00%  5   20
D       10     -50.00%  2000    25.00%  5   20
E       40      100.00% 2000    -9.09%  5   20
Jan-17  200     0.00%   8000    0.00%   50  100
C       100     0.00%   1600    0.00%   10  20
A       20      0.00%   1000    0.00%   10  20
B       40      0.00%   1600    0.00%   10  20
E       20      0.00%   2200    0.00%   10  20
D       20      0.00%   1600    0.00%   10  20
Roy1245
  • 507
  • 4
  • 18
  • 1
    First of all, you should change the format of your data.frame to do this operation. The reason is simple: the rows represent different individuals since row 1 refers to an aggregated sample whereas rows 2 to 6 refer to individual samples. Once you have a consistent data.frame (meaning all the rows represent individual of same nature) you can simply perform a group by based on variable month and compute the relative change using formulas to compute difference between consecutive rows like `lag`, `shift` or `diff` – Seymour May 06 '18 at 14:58
  • @Seymour Can't we do these operations with multiple if else conditions.?? – Roy1245 May 06 '18 at 15:29
  • you probably can but such approach would complicate your life. However, can you provide a reproducible example? – Seymour May 06 '18 at 15:31
  • @Seymour Here is the reproducible example: `structure(list(Month = c("Feb-17", "A", "B", "C", "D", "E", "Jan-17", "A", "B", "C", "D", "E"), Val1 = c(250L, 25L, 25L, 100L, 10L, 40L, 200L, 20L, 40L, 100L, 20L, 20L), Val2 = c(10000L, 2000L, 2000L, 2000L, 2000L, 2000L, 8000L, 1000L, 1600L, 1600L, 1600L, 2200L), Part1 = c(25L, 5L, 5L, 5L, 5L, 5L, 50L, 10L, 10L, 10L, 10L, 10L), Part2 = c(100L, 20L, 20L, 20L, 20L, 20L, 100L, 20L, 20L, 20L, 20L, 20L)), class = "data.frame", row.names = c(NA, -12L))` – Roy1245 May 06 '18 at 15:34
  • @Seymour For the sake of simplicity i have took only two months data here but actual dataframe consist of 25-30 months. – Roy1245 May 06 '18 at 15:35
  • Why the order of variable `Month` in the required output is different? It appears: C A B E D instead of A B C D E – Seymour May 06 '18 at 15:37
  • @Seymour Because in the original dataframe the sequence is decided based on value of `Val1` in decreasing order. (The same is not show here in example dataframe). – Roy1245 May 06 '18 at 15:43
  • What have you tried so far? – andrew_reece May 06 '18 at 16:34
  • @andrew_reece I have tried: `mutate("Per_1" = round(apply(cbind(Val1, lag(- Val1)), 1, sum, na.rm = TRUE) / lag(Val1) * 100, 2), "Per_2" = round(apply(cbind(Val2, lag(- Val2)), 1, sum, na.rm = TRUE) / lag(Val2) * 100, 2)) %>% ` but it didn't work. – Roy1245 May 06 '18 at 16:39
  • The code you've tried so far belongs in your question, not comments, as does your `dput` output – camille May 06 '18 at 22:30
  • @camille Thanks... Please try to help me out with this. – Roy1245 May 07 '18 at 03:13
  • See [here](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) on how to setup your question so that things like code, data, and explanations of the problem are all legible and in the body of the post, not in comments – camille May 07 '18 at 14:45
  • 1
    I'm still trying to understand your column `month`... – patL May 09 '18 at 10:48

1 Answers1

6

This can be done with dplyr and lead

I made a seperate column ID that replaced each month by a constant value so that we could then group_by and have a consistent category from month to month. Then I created the Per_1 and Per_2 columns by using lead. Last I select the columns that you want returned not including any of the intermediate columns.

library(dplyr)
df %>% 
  mutate(ID = ifelse(grepl("\\d",Month),"First",Month)) %>% 
  group_by(ID) %>% 
  mutate(Val1 = as.numeric(Val1),
         Val2 = as.numeric(Val2), 
         next.month1 = lead(Val1, order_by=ID) ,
         next.month2 = lead(Val2, order_by=ID), 
         Per_1 = 100*(Val1-next.month1)/next.month1,
         Per_2 = 100*(Val2-next.month2)/next.month2) %>%
  ungroup() %>% 
  select(Month,Val1,Per_1,Val2,Per_2,Part1,Part2)


# A tibble: 12 x 7
#   Month   Val1  Per_1  Val2   Per_2 Part1 Part2
#   <fct>  <dbl>  <dbl> <dbl>   <dbl> <int> <int>
# 1 Feb-17 250     25.0 10000   25.0     25   100
# 2 A       25.0   25.0  2000  100        5    20
# 3 B       25.0 - 37.5  2000   25.0      5    20
# 4 C      100      0    2000   25.0      5    20
# 5 D       10.0 - 50.0  2000   25.0      5    20
# 6 E       40.0  100    2000 -  9.09     5    20
# 7 Jan-17 200     NA    8000   NA       50   100
# 8 C      100     NA    1600   NA       10    20
# 9 A       20.0   NA    1000   NA       10    20
#10 B       40.0   NA    1600   NA       10    20
#11 E       20.0   NA    2200   NA       10    20
#12 D       20.0   NA    1600   NA       10    20
jasbner
  • 2,253
  • 12
  • 24
  • Thanks...But I want keep sequence of alphabet ( C,A,B,E and D) in the same order in `Jan-17` with same value. – Roy1245 May 08 '18 at 18:32
  • 1
    your required output has different order than your input dataframe. how do you determine that order? The order is not changed when you apply this function. I updated it considering a different input function. – jasbner May 08 '18 at 18:39