1

I have two dataframes df1 and df2, i want to groupby df1$delivery and create subset of df2 where one column df2$destination is equal df1$facility column and another df2$source column is not equal to same df1$facility and sum the df2$distance column

df1

facility delivery vol
ludhiana  abohar 123
delhi     abohar 234
mumbai    abohar 345
delhi     Adampu 739
jaipur    Adampu 21757
lucknow   Adampu 37449

df2

 source    destination dist
   delhi    ludhiana    10
   mumbai   ludhiana    5
   ludhiana delhi       10
   mumbai   delhi       5
   ludhiana mumbai      5
   delhi    mumbai      5
   delhi    jaipur      5
   jaipur   delhi       5
   delhi    lucknow     10
   lucknow  delhi       10
   jaipur   lucknow     5
   lucknow  jaipur      5

output expected

facility delivery vol   pan_india_dist
ludhiana  abohar  123   10
delhi     abohar  234   15
mumbai    abohar  345   10
delhi     Adampu  739   15
jaipur    Adampu  21757 10
lucknow   Adampu  37449 15

x <- sapply(data4$facility,function(i)sum((mh_mh$destination[data4$facility %in% i]) & (mh_mh$source[!data4$facility %in% i]))

I had tried this formula but it is throwing not meaningful factors error

Vasuki Rao
  • 37
  • 3
  • can you explain how you get value 10 for first row? – Ronak Shah Jun 05 '19 at 03:02
  • Can you use `dput` or `reprex` to make a reproducible example? It will make it easier for people to help you with your problem. https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – william3031 Jun 05 '19 at 03:46
  • sorry the first row is suppose to be 15, Logic for arriving at value 15 in first row is as below, for each facility name in df1 ,ex: ludhiana, filter df2$destination = ludhiana and df2$source != ludhiana, and add all the values in df2$dist column, groupby df1$delivery – Vasuki Rao Jun 06 '19 at 06:50

1 Answers1

0

An option with rap and tidyverse

library(rap)
library(tidyverse)
df1 %>% 
    rap(x = ~ filter(df2, destination == facility, source != facility)) %>% 
    mutate(x = map2(x, delivery, ~ 
             .x %>% 
               filter(source %in% df1$facility[df1$delivery == .y])  %>% 
               summarise(pan_india_dist = sum(dist)))) %>%
    unnest
#  facility delivery   vol pan_india_dist
#1 ludhiana   abohar   123             15
#2    delhi   abohar   234             15
#3   mumbai   abohar   345             10
#4    delhi   Adampu   739             15
#5   jaipur   Adampu 21757             10
#6  lucknow   Adampu 37449             15
akrun
  • 874,273
  • 37
  • 540
  • 662
  • not able to find rap library, I have installed RAP library but the function doesnt belong to that package – Vasuki Rao Jun 06 '19 at 03:43
  • Error in Ops.factor(destination, facility) : level sets of factors are different – Vasuki Rao Jun 06 '19 at 06:49
  • @VasukiRao I have the dataset columns as 'character' instead of factor for the faciliyt, delivery, destination, source etc – akrun Jun 06 '19 at 06:51
  • Quick top up on the same question, what change would be required in the code, if i want to take product of vol * dist before summation. ie. sum(vol X product) which will give be weighted distance – Vasuki Rao Jun 06 '19 at 10:21
  • @VasukiRao shouldn't the `sum(vol * dist)` works in `summarise` – akrun Jun 06 '19 at 14:03
  • Did try the same but apparently the result is coming way off, the actual result for abohar - ludhiana pair for mumbai and delhi is suppose to be sum((234*10)+(345*5)) = 4065, however if i do (dist * vol) i get result way off such as 414360 – Vasuki Rao Jun 10 '19 at 07:00