6

For example, suppose you have the following dataframe:

ID<-c("11", "12", "13", "14", "14")
Date<-c("2020-01-01", "2020-02-01", "2020-03-15", "2020-04-10", "2020-06-01")
Item<-c("Item1", "Item1", "Item2", "Item2", "Item2")
ItemPrice<-c(5, 5, 7, 7, 7)
Quantity<-c(1, 2, -2, 2, 3)
Cost<-c(5, 10, -14, 14, 21)
df<-data.frame(ID, Date, Item, ItemPrice, Quantity, Cost)
df

  ID       Date  Item ItemPrice Quantity Cost
1 11 2020-01-01 Item1         5        1    5
2 12 2020-02-01 Item1         5        2   10
3 13 2020-03-15 Item2         7       -2  -14
4 14 2020-04-10 Item2         7        2   14
5 14 2020-06-01 Item2         7        3   21

However, you wanted to separate the rows by Quantity so each one represents an individual sale like the following:

   ID       Date  Item ItemPrice Quantity Cost
1  11 2020-01-01 Item1         5        1    5
2  12 2020-02-01 Item1         5        1    5
3  12 2020-02-01 Item1         5        1    5
4  13 2020-03-15 Item2         7       -1   -7
5  13 2020-03-15 Item2         7       -1   -7
6  14 2020-04-10 Item2         7        1    7
7  14 2020-04-10 Item2         7        1    7
8  14 2020-06-01 Item2         7        1    7
9  14 2020-06-01 Item2         7        1    7
10 14 2020-06-01 Item2         7        1    7

How could this be achieved?

ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81
GM01
  • 237
  • 1
  • 4

5 Answers5

8

Create a count column with the absolute value of 'Quantity', change the 'Quantity' to sign of it, replace the 'Cost' by dividing it with 'cnt' column, and then replicate the rows with 'cnt' column

library(dplyr)
library(tidyr)
df %>% 
   mutate(cnt = abs(Quantity), Quantity = sign(Quantity), 
         Cost = Cost/cnt) %>%
   uncount(cnt) %>%
   as_tibble

-output

# A tibble: 10 x 6
#   ID    Date       Item  ItemPrice Quantity  Cost
#   <chr> <chr>      <chr>     <dbl>    <dbl> <dbl>
# 1 11    2020-01-01 Item1         5        1     5
# 2 12    2020-02-01 Item1         5        1     5
# 3 12    2020-02-01 Item1         5        1     5
# 4 13    2020-03-15 Item2         7       -1    -7
# 5 13    2020-03-15 Item2         7       -1    -7
# 6 14    2020-04-10 Item2         7        1     7
# 7 14    2020-04-10 Item2         7        1     7
# 8 14    2020-06-01 Item2         7        1     7
# 9 14    2020-06-01 Item2         7        1     7
#10 14    2020-06-01 Item2         7        1     7
akrun
  • 874,273
  • 37
  • 540
  • 662
7

Here is a base R way. I was going to close as duplicate of this question but there are small differences. The main instruction, the lapply loop is from this answer.

df$Cost <- df$Cost/abs(df$Quantity)
df <- as.data.frame(lapply(df, rep, abs(df$Quantity)))
df$Quantity <- sign(df$Quantity)

df
#   ID       Date  Item ItemPrice Quantity Cost
#1  11 2020-01-01 Item1         5        1    5
#2  12 2020-02-01 Item1         5        1    5
#3  12 2020-02-01 Item1         5        1    5
#4  13 2020-03-15 Item2         7       -1   -7
#5  13 2020-03-15 Item2         7       -1   -7
#6  14 2020-04-10 Item2         7        1    7
#7  14 2020-04-10 Item2         7        1    7
#8  14 2020-06-01 Item2         7        1    7
#9  14 2020-06-01 Item2         7        1    7
#10 14 2020-06-01 Item2         7        1    7
Rui Barradas
  • 70,273
  • 8
  • 34
  • 66
6

A data.table option

> setDT(df)[, lapply(.SD, function(x) rep(x / abs(Quantity), abs(Quantity))), ID:ItemPrice]     
    ID       Date  Item ItemPrice Quantity Cost
 1: 11 2020-01-01 Item1         5        1    5
 2: 12 2020-02-01 Item1         5        1    5
 3: 12 2020-02-01 Item1         5        1    5
 4: 13 2020-03-15 Item2         7       -1   -7
 5: 13 2020-03-15 Item2         7       -1   -7
 6: 14 2020-04-10 Item2         7        1    7
 7: 14 2020-04-10 Item2         7        1    7
 8: 14 2020-06-01 Item2         7        1    7
 9: 14 2020-06-01 Item2         7        1    7
10: 14 2020-06-01 Item2         7        1    7
ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81
3

I know this may sound a bit adventurous, But I like to constantly challenge myself to alternative solutions. Thought you might be interested:

library(dplyr)
library(tidyr)

df %>%
  rowwise() %>%
  mutate(Quantity = map(Quantity, ~ rep(.x/abs(.x), abs(.x)))) %>%
  unnest_longer(Quantity) %>%
  add_count(ID, Date) %>%
  mutate(Cost = Cost / n) %>%
  select(-n)

# A tibble: 10 x 6
   ID    Date       Item  ItemPrice Quantity  Cost
   <chr> <chr>      <chr>     <dbl>    <dbl> <dbl>
 1 11    2020-01-01 Item1         5        1     5
 2 12    2020-02-01 Item1         5        1     5
 3 12    2020-02-01 Item1         5        1     5
 4 13    2020-03-15 Item2         7       -1    -7
 5 13    2020-03-15 Item2         7       -1    -7
 6 14    2020-04-10 Item2         7        1     7
 7 14    2020-04-10 Item2         7        1     7
 8 14    2020-06-01 Item2         7        1     7
 9 14    2020-06-01 Item2         7        1     7
10 14    2020-06-01 Item2         7        1     7
Anoushiravan R
  • 21,622
  • 3
  • 18
  • 41
2

With dplyr only. You can use rowwise() and slice with a helper ntimescolumn

 library(dplyr)

 df %>% 
    mutate(ntimes = abs(Quantity)) %>% 
    rowwise() %>% 
    slice(rep(1:n(), each=abs(Quantity))) %>% 
    mutate(Cost = Cost/ntimes,
           Quantity = Quantity/ntimes) %>% 
    select(-ntimes)

Output:

   ID    Date       Item  ItemPrice Quantity  Cost
   <chr> <chr>      <chr>     <dbl>    <dbl> <dbl>
 1 11    2020-01-01 Item1         5        1     5
 2 12    2020-02-01 Item1         5        1     5
 3 12    2020-02-01 Item1         5        1     5
 4 13    2020-03-15 Item2         7       -1    -7
 5 13    2020-03-15 Item2         7       -1    -7
 6 14    2020-04-10 Item2         7        1     7
 7 14    2020-04-10 Item2         7        1     7
 8 14    2020-06-01 Item2         7        1     7
 9 14    2020-06-01 Item2         7        1     7
10 14    2020-06-01 Item2         7        1     7
Ben
  • 28,684
  • 5
  • 23
  • 45
TarJae
  • 72,363
  • 6
  • 19
  • 66