0

In my df, I define c('apple', 'banana') and c('banana', 'apple') are the same, casue the fruit type is the same just the arrangement is different.

Then, How can I remove row No.1 and row No.2 and only keep the last row(wanted_df).

df = data.frame(fruit1 = c('apple', 'banana', 'fig'),
                fruit2 = c('banana', 'apple', 'cherry'))
df

wanted_df = df[3,]

Any help will be high appreciated!

============================

Something wrong with my real data.

The frames2 loses rows which lag = 2. I wanted data frame shold like wanted_frames.

pollution1 = c('pm2.5', 'pm10', 'so2', 'no2', 'o3', 'co')
pollution2 = c('pm2.5', 'pm10', 'so2', 'no2', 'o3', 'co') 
dis = 'n'
lag = 1:2

frames = expand.grid(pollution1 = pollution1, 
                     pollution2 = pollution2,
                     dis = dis, 
                     lag = lag) %>% 
  mutate(pollution1 = as.character(pollution1),
         pollution2 = as.character(pollution2), 
         dis = as.character(dis)) %>% 
  as_tibble() %>% 
  filter(pollution1 != pollution2)

vec<- with(frames, paste(pmin(pollution1, pollution2), pmax(pollution1, pollution2)))

frames2 = frames[!duplicated(vec), ]

wanted_frames = frames2 %>% mutate(lag = 2) %>% bind_rows(frames2)

zhiwei li
  • 1,635
  • 8
  • 26
  • Could you show an expected output? How what you like `frames2` to appear, if you just showed a manual example. – cmirian Feb 19 '21 at 08:35
  • @ cmirian, Hi, the last code `wanted_frames` is my expected output. – zhiwei li Feb 19 '21 at 08:37
  • `pollution1` and `pollution2` are identical. So if you apply `filter` that omits duplicates, you gonna end up with zero rows. I am not entirely sure what you are trying to achieve. – cmirian Feb 19 '21 at 08:38

3 Answers3

3

Try this.

library(dplyr)
d <- filter(df, !(fruit1 %in% fruit2) | !(fruit2 %in% fruit1))

Which gives

> d
  fruit1 fruit2
1    fig cherry

Update

As commented by @JonSpring and @Phil, the updated code should be

df %>% rowwise() %>% filter(!(fruit1 %in% fruit2) | !(fruit2 %in% fruit1))%>% ungroup()
cmirian
  • 2,572
  • 3
  • 19
  • 59
  • 2
    Such a simple idea. Shouldn't it be `filter(df, !(fruit1 %in% fruit2) | !(fruit2 %in% fruit1))`? – Phil Feb 19 '21 at 07:30
  • Sure, thank you @Phil - updated accordingly. Have a great weekend. – cmirian Feb 19 '21 at 07:55
  • 2
    I don't believe this works in all cases, e.g for `df = data.frame(fruit1 = c('apple', 'cherry', 'banana', 'fig'), fruit2 = c('banana', 'apple', 'apple', 'cherry'))`. In that case row 2 is a unique combination, but is filtered out b/c one of the elements is found in the other column in another row. – Jon Spring Feb 19 '21 at 08:52
  • 1
    @JonSpring is correct - should be fixed with `df %>% rowwise() %>% filter(...) %>% ungroup()` but it could make it slower. – Phil Feb 19 '21 at 15:24
2

A base R way :

vec<- with(df, paste(pmin(fruit1, fruit2), pmax(fruit1, fruit2)))
df[!(duplicated(vec) | duplicated(vec, fromLast = TRUE)), ]

#   fruit1 fruit2
#3    fig cherry
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • @ Ronak Shah, Thanks for your reply,but something wrong when I use your method in my real data, and I update my question. – zhiwei li Feb 19 '21 at 08:02
  • @zhiweili 1) You have not used complete code of my answer. 2) For your shared dataframe all the values are duplicates so everything is removed from the data. – Ronak Shah Feb 19 '21 at 09:09
  • Hi, @ Ronak Shah, I have a new quesiton post on https://stackoverflow.com/questions/72023327/can-not-use-pivot-longer-in-r-with-multile-cell-value-in-r. I think may be you could help me. Thanks a lot. – zhiwei li Apr 27 '22 at 04:37
1

Here's a low-tech dplyr approach. Make a sorted key, then keep rows with unique keys.

library(dplyr)
df %>%
    mutate(key = paste(pmin(fruit1, fruit2), pmax(fruit1, fruit2))) %>%
    add_count(key) %>%
    filter(n == 1)
Jon Spring
  • 55,165
  • 4
  • 35
  • 53