-1

I have a data object similar to the following:

> temp2 %>% arrange(date_val) %>% select(date_val,kpi_name,kpi_value)
# Source:     spark<?> [?? x 3]
# Ordered by: date_val
   date_val     kpi_name              kpi_value
   <dttm>              <chr>                     <dbl>
 1 2018-12-04 00:00:00 KPI1                          0
 2 2018-12-04 00:00:00 KPI2                         38
 3 2018-12-04 00:01:00 KPI2                         55
 4 2018-12-04 00:01:00 KPI1                          1
 5 2018-12-04 00:02:00 KPI2                         55
 6 2018-12-04 00:02:00 KPI1                          1
 7 2018-12-04 00:03:00 KPI1                          0
 8 2018-12-04 00:03:00 KPI2                         58
 9 2018-12-04 00:04:00 KPI2                         45
10 2018-12-04 00:04:00 KPI1                          1
# ⦠with more rows
>

I would like to insert a new row for each grouped date_val which will perform a calculation for that date_val group on the kpi_name/kpi_value available in the current object. For example, let's say I need to calculate the following new KPI3 as 100*(KPI1/KPI2) which will provide a new data object such as:

# Source:     spark<?> [?? x 3]
# Ordered by: date_val
   date_val     kpi_name              kpi_value
   <dttm>              <chr>                     <dbl>
 1 2018-12-04 00:00:00 KPI1                          0
 2 2018-12-04 00:00:00 KPI2                         38
 3 2018-12-04 00:00:00 KPI3                          0
 4 2018-12-04 00:01:00 KPI2                         55
 5 2018-12-04 00:01:00 KPI1                          1
 6 2018-12-04 00:01:00 KPI3                      0.018
 7 2018-12-04 00:02:00 KPI2                         55
 8 2018-12-04 00:02:00 KPI1                          1
 9 2018-12-04 00:02:00 KPI3                      0.018
10 2018-12-04 00:03:00 KPI1                          0
11 2018-12-04 00:03:00 KPI2                         58
12 2018-12-04 00:03:00 KPI3                          0
13 2018-12-04 00:04:00 KPI2                         45
14 2018-12-04 00:04:00 KPI1                          1
15 2018-12-04 00:04:00 KPI3                      0.022
# ⦠with more rows

Can this be done in DPLYR?

Jeff Kraus
  • 175
  • 1
  • 6

1 Answers1

1

This should do it:

library(tidyverse)

temp2 %>% spread(kpi_name, kpi_value) %>% 
  mutate(KPI3 = 100*(KPI1/KPI2)) %>% 
  gather(kpi_name, kpi_value, -date_val)

While it's technically possible rbind in new rows, it's comparatively inefficient and syntactically clunky. It makes much more sense to transform to the logical wide format, add the column, and then transform back.

Mako212
  • 6,787
  • 1
  • 18
  • 37
  • Thanks. Looks like the spark object does not support the spread function, but I can covert the object to "as_tibble" first and it works as expected. – Jeff Kraus Jan 15 '19 at 19:27
  • @JeffKraus Great, also, this answer might help if you want to stay with the spark object: https://stackoverflow.com/questions/50465390/gather-in-sparklyr – Mako212 Jan 15 '19 at 19:30