Summarize columns in R Data Frame independent of the order, (df$A,df$B) = (df$B,df$A)

Question

I have the following Data frame:


    
      destiny origin Count
    1 KJFK    SBBR       4
    2 KJFK    SAEZ    4683
    3 SBGL    KJFK       2
    4 SBBR    KJFK       2
    5 KJFK    SBGL    4987
    6 KJFK    SBGR   12911
    ...

As I'm interesting in the route, for me KJFK -> SBBR is the same as SBBR -> KJFK. So I want to sum their count, as the table below


    
      destiny origin Count
    1 KJFK    SBBR       6
    2 KJFK    SAEZ    4683
    3 SBGL    KJFK    4989
    4 KJFK    SBGR   12911
    ...

I did't want to use a big for loop to evaluate all the values

score 1 · Accepted Answer · answered Jul 10 '18 at 22:08

How about this?

library(tidyverse)
df %>%
    mutate_if(is.factor, as.character) %>%
    rowwise() %>%
    mutate(grp = paste0(sort(c(destiny, origin)), collapse = "_")) %>%
    ungroup() %>%
    group_by(grp) %>%
    summarise(Count = sum(Count)) %>%
    separate(grp, into = c("destiny", "origin"))
#        # A tibble: 4 x 3
#  destiny origin Count
#  <chr>   <chr>  <int>
#1 KJFK    SAEZ    4683
#2 KJFK    SBBR       6
#3 KJFK    SBGL    4989
#4 KJFK    SBGR   12911

Note that since you don't care about the order of destiny, origin, so here we order them alphabetically. So in the example you give above, KJFK -> SBBR and SBBR -> KJFK will become destiny = KJFK, origin = SBBR.

Sample data

df <- read.table(text =
    "  destiny origin Count
    1 KJFK    SBBR       4
    2 KJFK    SAEZ    4683
    3 SBGL    KJFK       2
    4 SBBR    KJFK       2
    5 KJFK    SBGL    4987
    6 KJFK    SBGR   12911", header =T)

It works fine :) but mutate_if(is.factor, as.character) only format the second column, so I transform the first on into character before this — gustavo Pacheco, Jul 13 '18 at 14:05
@gustavoPacheco The `mutate_if` works on all columns that are `factor` columns. Perhaps your first column was not a `factor`. — Maurits Evers, Jul 14 '18 at 09:13

akrun · Answer 2 · 2018-07-10T22:32:47.343

Here is an option with pmin/pmax

library(tidyverse)
df1 %>%       
  group_by(destinyN = pmin(destiny, origin), originN = pmax(destiny, origin)) %>% 
  summarise(destiny = first(destiny), 
            origin = first(origin), 
            Count = sum(Count)) %>%
  ungroup %>%
  select(-destinyN, -originN)
# A tibble: 4 x 3
#  destiny origin Count
#  <chr>   <chr>  <int>
#1 KJFK    SAEZ    4683
#2 KJFK    SBBR       6
#3 SBGL    KJFK    4989
#4 KJFK    SBGR   12911

data

df1 <- structure(list(destiny = c("KJFK", "KJFK", "SBGL", "SBBR", "KJFK", 
"KJFK"), origin = c("SBBR", "SAEZ", "KJFK", "KJFK", "SBGL", "SBGR"
), Count = c(4L, 4683L, 2L, 2L, 4987L, 12911L)), .Names = c("destiny", 
"origin", "Count"), row.names = c("1", "2", "3", "4", "5", "6"
), class = "data.frame")

Summarize columns in R Data Frame independent of the order, (df$A,df$B) = (df$B,df$A)

2 Answers2

Sample data

data

Linked