0

In R, I have this primary Order table:

enter image description here

And I have to build this second table which has the steps.First order has 3 steps and the second order has 4 steps.And I am looking for ways how to populate the Input and Output Quantity fields :

enter image description here

The InputQuantity of the first order 'Product1_Slit_Product2' should be the InputQuantity of its first OrderStep 'Product1_Slit_Product2_1'.

The OutputQuantity of the first order 'Product1_Slit_Product2' should be the OutputQuantity of its last OrderStep 'Product1_Slit_Product2_3.

Like this:

enter image description here

And the empty fields need to be populated with any value which lies between the InputQuantity and OutputQuantity, but gradually decreasing.. for example like this:

enter image description here

Please help me build this second table in R with respect to populating the Input and OutputQuantity according to the first table. Any pointers/tips would help this beginner. Thanks in advance.

Pree
  • 77
  • 10
  • 1
    Please post [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) – pogibas Oct 04 '17 at 13:58

1 Answers1

2

We can use functions from dplyr, tidyr, and imputeTS packages.

First, let's create a data frame similar to your example.

# Create example data frame
dt <- data_frame(Order = c(rep(1, 3), rep(2, 4)),
                 OrderSteps = c(1:3, 1:4),
                 InputQuantity = c(300.56, NA, NA, 65.89, NA, NA, NA),
                 OutputQuantity = c(NA, NA, 243.65, NA, NA, NA, 12.54))

dt
# A tibble: 7 x 4
  Order OrderSteps InputQuantity OutputQuantity
  <dbl>      <int>         <dbl>          <dbl>
1     1          1        300.56             NA
2     1          2            NA             NA
3     1          3            NA         243.65
4     2          1         65.89             NA
5     2          2            NA             NA
6     2          3            NA             NA
7     2          4            NA          12.54

Second, convert the data frame from wide format to long format

dt2 <- dt %>%
  gather(Type, Value, InputQuantity:OutputQuantity) %>%
  arrange(Order, OrderSteps)

dt2
# A tibble: 14 x 4
   Order OrderSteps           Type  Value
   <dbl>      <int>          <chr>  <dbl>
 1     1          1  InputQuantity 300.56
 2     1          1 OutputQuantity     NA
 3     1          2  InputQuantity     NA
 4     1          2 OutputQuantity     NA
 5     1          3  InputQuantity     NA
 6     1          3 OutputQuantity 243.65
 7     2          1  InputQuantity  65.89
 8     2          1 OutputQuantity     NA
 9     2          2  InputQuantity     NA
10     2          2 OutputQuantity     NA
11     2          3  InputQuantity     NA
12     2          3 OutputQuantity     NA
13     2          4  InputQuantity     NA
14     2          4 OutputQuantity  12.54

Third, filter the data frame and then impute the NA.

dt3 <- dt2 %>%
  filter(Type %in% "OutputQuantity" | !is.na(Value)) %>%
  mutate(Value = na.interpolation(Value))

dt3
# A tibble: 9 x 4
  Order OrderSteps           Type    Value
  <dbl>      <int>          <chr>    <dbl>
1     1          1  InputQuantity 300.5600
2     1          1 OutputQuantity 281.5900
3     1          2 OutputQuantity 262.6200
4     1          3 OutputQuantity 243.6500
5     2          1  InputQuantity  65.8900
6     2          1 OutputQuantity  52.5525
7     2          2 OutputQuantity  39.2150
8     2          3 OutputQuantity  25.8775
9     2          4 OutputQuantity  12.5400

Finally, merge dt2 and dt3, and then fill the NA with the previous record. After that, convert the data frame back to wide format.

dt4 <- dt2 %>%
  left_join(dt3, by = c("Order", "OrderSteps", "Type")) %>%
  fill(Value.y) %>%
  select(-Value.x) %>%
  spread(Type, Value.y)
dt4
# A tibble: 7 x 4
  Order OrderSteps InputQuantity OutputQuantity
* <dbl>      <int>         <dbl>          <dbl>
1     1          1      300.5600       281.5900
2     1          2      281.5900       262.6200
3     1          3      262.6200       243.6500
4     2          1       65.8900        52.5525
5     2          2       52.5525        39.2150
6     2          3       39.2150        25.8775
7     2          4       25.8775        12.5400

dt4 is the final output.

www
  • 38,575
  • 12
  • 48
  • 84
  • Thank you for the detailed answer ycw! I will try this out. – Pree Oct 05 '17 at 06:16
  • Executing the last dt4 step leads me to this error : Error in select(., -Value.x) : unused argument (-Value.x) . I have tried several ideas, but not getting the nail to hit in this very last step.Could you ycw point out what is wrong here? Thanks. – Pree Oct 09 '17 at 09:22
  • @Pree Make sure you are using the `select` function from `dplyr`, not from other packages. This could be helpful: https://stackoverflow.com/questions/24202120/dplyrselect-function-clashes-with-massselect – www Oct 09 '17 at 10:06
  • This worked like a charm! Thanks a ton, you made my day! :) – Pree Oct 09 '17 at 12:04