0

I have a data frame with the following feature:

one = c("A", "A", "B" ,"C" )
two = c("B", "C", "A", "A")
three = c(150, 0, 0, 160)
four = c(0, 200, 190, 0)

df <- data.frame(one, two, three, four)
df
  one two three four
1   A   B   150    0
2   A   C     0  200
3   B   A     0  190
4   C   A   160    0

I would like to combine the rows where the value of column one appears in column two and vice versa to:

  one two three four
1   A   B   150  200
2   A   C   190  160

Is there a way to do this in dplyr? Other suggestions are also greatly appreciated.

Axel
  • 150
  • 2
  • 7

1 Answers1

1

With data.table, you can do

library(data.table)
setDT(df)[one > two, c("one","two") := .(two, one)]
df[, lapply(.SD,sum), by=.(one,two)]

or with dplyr, the baroque analogue:

library(dplyr)
df %>% 
  mutate(
    bad = one > two, 
    one = ifelse(bad, two, one), 
    two = ifelse(bad, .$one, two), 
    bad = NULL) %>% 
  group_by(one,two) %>% 
  summarise_each(funs(sum))

The idea is to sort the first and second columns by swapping them where unordered. Taking pmin and pmax would be another way of sorting, covered in the linked question.

Frank
  • 66,179
  • 8
  • 96
  • 180